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Abstract. The Hales- Jewett theorem asserts that for every r and every k there exists n 
such that every r-colouring of the n-dimensional grid {!,..., fc}" contains a combinatorial 
line. This result is a generalization of van der Waerden's theorem, and it is one of the 
fundamental results of Ramsey theory. The theorem of van der Waerden has a famous 
density version, conjectured by Erdos and Turan in 1936, proved by Szemeredi in 1975, 
and given a different proof by Furstenberg in 1977. The Hales- Jewett theorem has a 
density version as well, proved by Furstenberg and Katznelson in 1991 by means of a 
significant extension of the ergodic techniques that had been pioneered by Furstenberg 
in his proof of Szemeredi's theorem. In this paper, we give the first elementary proof of 
the theorem of Furstenberg and Katznelson, and the first to provide a quantitative bound 
on how large n needs to be. In particular, we show that a subset of {1, 2, 3}" of density 
5 contains a combinatorial line if n is at least as big as a tower of 2s of height 0(l/i5^). 
Our proof is surprisingly simple: indeed, it gives arguably the simplest known proof of 
Szemeredi's theorem. 



1. Introduction 

1.1. Statement of our main result. The purpose of this paper is to give the first elemen- 
tary proof of the density Hales- Jewett theorem. This theorem, first proved by Furstenberg 
and Katznelson |FK89t IFK91] , has the same relation to the Hales- Jewett theorem [HJ63] 
as Szemeredi's theorem |Sze75j has to van der Waerden's theorem |vdW27] . Before we go 
any further, let us state all four theorems. We shall use the notation [k] to stand for the 
set {1, 2, . . . , /c}. If X is a set and r is a positive integer, then an r-colouring of X will 
mean a function k: X — )■ [r]. A subset F of X is called monochromatic if n{y) is the same 
for every y 

We begin with van der Waerden's theorem. 

Theorem 1.1. For every pair of positive integers k and r there exists N such that for 
every r-colouring of [N] there is a monochromatic arithmetic progression of length k. 

Szemeredi's theorem is the density version of van der Waerden's theorem. That is, it 
says that in van der Waerden's theorem one can always find an arithmetic progression in 
any colour class that is used reasonably often. 

Theorem 1.2. For every positive integer k and every S > there exists N such that every 
subset A C [N] of size at least 6N contains an arithmetic progression of length k. 

The reason it is called a density version is that we think of |A|/X as the density of A 
inside [N], so the condition on A is that it has density at least 6. 
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To state the Hales- Jewett theorem, we need a httle more terminology. The theorem is 
concerned with subsets of [fc]", elements of which we refer to as points (or strings). Instead 
of looking for arithmetic progressions, the Hales- Jewett theorem looks for structures known 
as combinatorial lines. There are many equivalent ways of defining these, of which one is 
the following. Let [n] be partitioned into sets Xi, . . . , Xk,W in such a way that W is 
non-empty. Then take the set of all points x such that Xi = j whenever j < k and i & Xj, 
and Xi takes the same value for every i G W. The only choice we have in specifying such 
an X is the value we assign to the coordinates Xi with i G W, so each line contains k points. 

Here is a simple example of a combinatorial line when k = 3 and n = 8: 

{(1, 3, 1, 2, 2, 1, 1, 2), (2, 3, 2, 2, 2, 2, 1, 2), (3, 3, 3, 2, 2, 3, 1, 2)} 

In this case the sets Xi,X2,X3 and W are {7}, {4,5,8}, {2}, and {1,3,6}, respectively. 

The coordinates in Xi U ■ ■ ■ U Xk are called the fixed coordinates of the line, and the 
coordinates in W are the variable coordinates or wildcards. 

Another way of thinking of a line is as an element of the set ([A;] U {*})", where at least 
one coordinate takes the wildcard value *. To obtain the k points in the line, one lets j 
run from 1 to k and sets all the wildcards equal to j. For instance, in this notation the 
line above is 

(*,3,*,2,2,*,1,2). 

With both these ways of thinking of combinatorial lines, it is clear that there is a close 
relationship between lines in [k]"" and points in [k + 1]". Indeed, if one allows "degenerate 
lines" in which the wildcard sets are empty then there is an obvious one-to-one correspon- 
dence between the two sets. This will be very important to us later. 

We are now ready to state the Hales- Jewett theorem. 

Theorem 1.3. For every pair of positive integers k and r there exists a positive number 
HJ{k,r) such that for every n >HJ{k,r) and every r-colouring of the set [k]"' there is a 
monochromatic combinatorial line. 

As with van der Waerden's theorem, we may consider the density version of the Hales- 
Jewett theorem, where the density of A C [fc]" is The following theorem was first 

proved by Furstenberg and Katznelson |FK91] . 

Theorem 1.4. For every positive integer k and every real number 6 > there exists a 
positive integer DHJ{k, 6) such that if n >DHJ{k, 6) and A is any subset of [A;]" of density 
at least 6, then A contains a combinatorial line. 

We sometimes write "DHJ^" to mean the k case of this theorem. The first nontrivial 
case, DHJ2, is a weak version of Sperner's theorem |Spe28| ; we discuss this further in 
Section [2J We also remark that the Hales- Jewett theorem easily implies van der Waerden's 
theorem, and likewise for the density versions. To see this, temporarily interpret [m] as 
{0, 1, . . . , m — 1} rather than {1,2,..., m}, and identify integers in [N] with their base-Zc 
representation in [k]"'. It is then easy to see that a combinatorial line in [fc]" corresponds 
to an arithmetic progression of length k in [A^] : if the wildcard set of the line is 5", then 
the common difference of the progression is Ylies However, only very few arithmetic 
progressions of length k in [A^] arise in this way, so finding combinatorial lines is strictly 
harder than finding arithmetic progressions. (Further evidence for this is that several 
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other results are easy consequences of the Hales- Jewett theorem and its density version: in 
particular, it is an exercise to deduce the multidimensional Szemeredi theorem from DHJ.) 

In this paper, we give a new, elementary proof of the density Hales- Jewett theorem, 
very different from that of Furstenberg and Katznelson (though the discovery of one part 
of the argument, sketched in §5.41 was in part inspired by ergodic methods). Our proof 
gives rise to the first known quantitative bounds for the theorem. Define the tower function 

T(n) inductively by taking T(l) = 2 and T(n) = 2^^""^^ (so for instance T(4) = 2^^ = 
65536). More generally, define (not quite standardly) the ki\i function in the Ackermann 
hierarchy by setting ^4^(1) = 2 and Akin) = Ak-i{Ak{n — 1)), with Ai{n) = 2n. Thus, the 
kth function is obtained by iterating the {k — l)st function, so A2{n) = 2" and A^^n) = 
Tin). 

Theorem 1.5. In the density Hales-Jewett theorem, one may take DHJ36 = T(0(l/5^)). 
For k > 4, the bound DHJ^S we achieve is broadly comparable to the function ^4^(1/5). 

By "broadly comparable" we mean something like that it is much nearer to Ak{l/S) 
than to Ak+i{l/S). In fact, the bound we obtain is something like Ak{Ak-i{l / S)) . (To give 
an idea, if we were to apply a composition of this kind to the function Ak-i{n) = 2", then 
Ak{n) would be a tower of height n, whereas Ak{Ak-i{n)) would be a tower of height 2"^.) 

Another way of phrasing our result is in terms of the number Cn,3, the cardinality of the 
largest subset of [3]*^ without a combinatorial line. Theorem 11.51 states that c„,,3/3"' < 
0(1/ -y/log* n). The only known lower bounds appear in a parallel paper to this one 
that is by an overlapping set of authors |Pol09] : in that paper it is shown that c„,3 = 
2, 6, 18, 52, 150, 450 for n = 1, 2, 3, 4, 5, 6, and for large n that c„,3/3" > exp{-0{y/\og^)). 
Generalizing to DHJ^, the authors show that Cn^k/k'^ > exp(— 0(logn)^/'^^°^2 ^1)^ using ideas 
from recent work on the construction of Behrend |Beh46j . 

1.2. The motivation for finding a new proof. Why is it interesting to give a new 
proof of the density Hales- Jewett theorem? There are two main reasons. The first is 
connected with the history of results and techniques in this area. One of the main benefits 
of Furstenberg's proof of Szemeredi's theorem was that it introduced a technique — ergodic 
methods — that could be developed in many directions, which did not seem to be the case 
with Szemeredi's proof. As a result, several far-reaching generalizations of Szemeredi's 
theorem were proved jBL96l IFK781 IFur85| IFK91] , and for a long time nobody could prove 
them in any other way than by using Furstenberg's methods. In the last few years that 
has changed, and a programme has developed to find new and finitary proofs of the results 
that were previously known only by infinitary ergodic methods; see, e.g., jRS04| INRS06| 
[RS061 IRSoVbl IRSOTal IGow06l IGowOTl ITao06l ITaoOTj . Giving a non-ergodic proof of the 
density Hales- Jewett theorem was seen as a key goal for this programme, especially since 
Furstenberg and Katznelson's ergodic proof seemed significantly harder than the ergodic 
proof of Szemeredi's theorem. Having given a purely finitary proof, we are able to obtain 
explicit bounds for how large n needs to be as a function of 6 and k in the density Hales- 
Jewett theorem. Such bounds could not be obtained via the ergodic methods even in 
principle, since these proofs rely on the Axiom of Choice. Admittedly, our explicit bounds 
are not particularly good: we start with a tower-type dependence for k = 3 and go up 
a level of the Ackermann hierarchy each time we go from k to k + 1. However, they are 
in line with several other bounds in the area. For example, the best known bounds for 
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the multidimensional Szemeredi theorem |Gow07t INRS06] (which is an easy consequence 
of DHJ) are also of this type. 

A second reason that a new proof of the density Hales- Jewett theorem is interesting is 
that it immediately implies Szemeredi's theorem, and finding a new proof of Szemeredi's 
theorem seems always to be illuminating — or at least this has been the case for the 
four main approaches discovered so far (combinatorial |Sze75j . ergodic |Fur77t IFKU82j . 
Fourier |Gow01] . hypergraph removal |Gow06t IGow07t IRS04t INRS06] ). Surprisingly, in 
view of the fact that DHJ is considerably more general than Szemeredi's theorem and 
the ergodic-theory proof of DHJ is considerably more complicated than the ergodic-theory 
proof of Szemeredi's theorem, the new proof we have discovered gives arguably the sim- 
plest proof yet known of Szemeredi's theorem. It seems that by looking at a more general 
problem we have removed some of the difficulty. Related to this is another surprise. We 
started out by trying to prove the first difficult case of the theorem, DHJ3. The experience 
of all four of the earlier proofs of Szemeredi's theorem has been that interesting ideas are 
needed to prove results about progressions of length 3, but significant extra difficulties 
arise when one tries to generalize an argument from the length-3 case to the general case. 
Unexpectedly, it turned out that once we had proved the case k = 3 oi the density Hales- 
Jewett theorem, it was straightforward to generalize the argument to the k > 4 cases. We 
do not fully understand why our proof should be different in this respect, but it is perhaps 
a sign that the density Hales- Jewett theorem is at a "natural level of generality" . 

One might ask, if this is the case, why the proof of Furstenberg and Katznelson seems 
to be more complicated than the ergodic-theoretic proofs of Szemeredi's theorem and its 
multidimensional version. An explanation for this discrepancy is that our proof appears 
to be genuinely different from theirs (that is, not just a translation of their proof into a 
more elementary language). The clearest sign of this is that they use Carlson's theorem, 
a powerful result in Ramsey theory, in an essential way, whereas we have no need of any 
colouring results in our argument (unless you count the occasional use of the pigeonhole 
principle) . 

Before we start working towards the proof of the theorem, we would like briefly to men- 
tion that it was proved in a rather unusual "open source" way, which is why it is being pub- 
lished under a pseudonym. The work was carried out by several researchers, who wrote their 
thoughts, as they had them, in the form of blog comments at http:/ /gowers. wordpress.com. 
Anybody who wanted to could participate, and at all stages of the process the comments 
were fully open to anybody who was interested. (Indeed, taking some inspiration from 
a few of these blog comments, Austin provided another new (ergodic) proof of the den- 
sity Hales- Jewett theorem [AusOQj .) This open process was in complete contrast to the 
usual way that results are proved in private and presented in a finished form. The blog 
comments are still available, so although this paper is a polished account of the DHJ^ 
argument, it is possible to read a record of the entire thought process that led to the proof. 
The constructions of new lower bounds for the DHJ^ problem, mentioned in Section 11.11 
are being published by a partially overlapping set of researchers [Pol09j . The participants 



in the project also created a wiki, http://michaelnielsen.org/polymathl/, which contains 



sketches of the arguments, links to the blog comments, and a great deal of related material. 

1.3. Combinatorial subspaces and multidimensional DHJ. We know from the den- 
sity Hales- Jewett theorem that dense subsets of [/c]" contain combinatorial lines. It is 
natural to wonder whether there is a higher-dimensional version of this result, in which 
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one finds (i-dimensional subspaces. Such a result does indeed exist, and is a straightforward 
consequence of DHJ, as was observed by Furstenberg and Katznelson. Since we shall need 
this extension, we briefly define the relevant concepts and give the proof. 

A (i-dimensional combinatorial subspace is just like a combinatorial line except that 
there are d wildcard sets instead of just one. In other words we partition the ground set 
[n] into k + d sets Xi, . . . , X^, Wi, . . . , Wd such that Wi, . . . , Wd are non-empty, and the 
subspace consists of all sequences x such that Xj = j whenever i G Xj and x is constant 
on each set Wr- There is an obvious isomorphism between [kY and any dimensional 
combinatorial subspace: the sequence z = {zi, . . . ,Zd) is sent to the sequence x such that 

= j whenever i G Xj and Xi = z,- whenever x G Wr- 

Note that there is an obvious injection from the set of all (i-dimensional combinatorial 
subspaces of [fc]" to [k + (i]" (which becomes a bijection if one allows the subspaces to be 
degenerate) . 

The multidimensional density Hales- Jewett theorem is the following. 

Theorem 1.6. For every 5 > and every pair of integers k and d there exists a positive 
integer MDHJ{k, d, 6) such that, for every n >MDHJ{k, d, 6) and every subset A C [k]"', A 
contains a d- dimensional combinatorial subspace of [/c]". 

We shall refer to this theorem as MDHJ, and for each k we shall refer to the result for 
that k as MDHJfc. 

Proposition 1.7. For every k, MDHJ^ follows from DHJ^. 

Proof. We prove the result by induction on d. Suppose we know MDHJ^^^ for dimension 
d — 1, and let A C [/c]" have density at least 6. Let m =MDHJ(A;, d — 1, S/2), and write a 
typical string z G [/c]" as {x,y), where x G [k]"^ and y G [A;]""™. Call a string y G 
"good if Ay = {xe [k]"" : {x,y) G A} has density at least 6/2 within [A;]™. Let G C [A;]""™ 
be the set of good ?/'s. Then the density of G within [Ac]""™ must be at least S/2, or A 
could not have density at least S in [A;]"~™. 

By induction, for any good y the set Ay contains a {d — l)-dimensional combinatorial 
subspace. There are at most M = {k + d — 1)™ such subspaces, because of the injection 
mentioned above. Therefore, there must be some subspace cr C [k]"^ such that the set 

G^ = {y e [kr-"' : {x,y) e A^x e a} 

has density at least {6/2)/M within [A:]""'". Provided that n > m-|-DHJ(A;, S/2M), we may 
conclude from DHJ^ that G^ contains a combinatorial line, A. Then a x A is the desired 
(i-dimensional subspace of [A;]" that is contained in A. □ 

Because we have to iterate DHJ^, with rapidly decreasing densities in order to obtain 
this result, the bound that we get from it is very bad indeed: it is this that causes the 
Ackermann-type dependence on k in our main theorem. 

1.4. Density-increment strategies. Very briefly, our proof of DHJ^ follows a density- 
increment strategy, a technique that was pioneered by Roth |Rot53j in his proof of the 
A; = 3 case of Szemeredi's theorem. There are now many such proofs in the literature, of 
which most have the following form. One would like to prove that every dense subset A of 
a mathematical structure S (such as an arithmetic progression or the set [A;]") contains a 
subset X of a certain type (such as a subprogression of length A; or a combinatorial line). It 
is usually hard to show this in one step, so instead one proves that if A has density 5 in S* 
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and does not contain a subset of the desired kind, then S has a substructure S' such that 
the density of A inside S' is at least 6 + c, where c is some positive constant that depends 
only on 6. This is the density increment. If 5" is of a similar nature to 5", then one can 
iterate this argument, and if S is large enough, then one can continue iterating until the 
density exceeds 1 and one has a contradiction, from which one deduces that A must after 
all contain a subset X of the desired kind. 

Even getting directly from to a density increment on a substructure S' in one step is 
usually too hard, so typically there is an intermediate stage. First, one finds a set T that 
is in some sense "simple" such that the density of A inside T is at least S + c. Then one 
proves that "simple" sets T can be partitioned into substructures Si, ... , Sn and uses an 
averaging argument to show that the density of A inside some Si is also at least 6 + c. 
There are also variants of this: for instance, it is enough to find subsets 5*1, ... , Sn of T 
such that every element of T is in the same number of Si, or even in approximately the 
same number of Si. 

A few proofs that have this basic structure are Roth's proof itself (where the interme- 
diate structure is a mod-iV arithmetic progression, which can be partitioned into genuine 
arithmetic progressions), Gowers's proof of Szemeredi's theorem |Gow01j . and an argument 
of Shkredov [Shk06a[ IShkOGb] that gives strong bounds for the "corners problem" , a result 



that we shall discuss in detail in Section HI 

2. Sperner's theorem and its multidimensional version 

The case /c = 2 of the density Hales- Jewett theorem is equivalent to the following state- 
ment: for every S > there exists n such that if ^ is a collection of at least 52" subsets 
of [n] then there exist distinct sets A,B E A such that A G B. The equivalence is easily 
seen if one looks at the characteristic functions of the sets, in which case one sees that a 
pair {A, B) with A (Z B corresponds to a combinatorial line in {0, 1}". 

Exact bounds are known for this theorem. The nicest proof is the following one, which 
will have a considerable influence on our later proofs. Recall that an antichain is a collection 
of sets such that no set in the collection is a proper subset of any other. 

Theorem 2.1. Let n he a positive integer and let A he an antichain of suhsets of [n]. Then 

I^I<(k2j)- 

Proof. Consider the following way of choosing a random subset of [n]. One chooses a 
random permutation vr of [n] and a random integer m G {0, 1, . . . ,n} and takes the set 
A = {7r(l), . . . ,7r(m)}. Since A is an antichain, for each tt there is at most one m such 
that the resulting set belongs to A. Thus, the probability of choosing a set in A is at most 
l/(n + l). 

Now the probability of choosing a particular set A of size m is (n + 1)"^ . Therefore, 
if we want A to be as large as possible but for the probability of choosing a set in A to 
be at most {n + then we must choose A to consist of sets of size m such that (^) is 
maximized. It follows that we cannot choose more than (^„"2j) claimed. □ 

We shall also need a multidimensional version of Sperner's theorem. This time we are 
trying to maximize the size of A subject to the condition that it is not possible to find a 
d-dimensional combinatorial subspace, which in set-theoretic terms means a collection of 
disjoint non-empty sets A, Ai, . . . , A^ such that AU[J-^^ Ai E A for every E C {1,2, . . . ,d}. 
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The result we need was proved by Gunderson, Rodl and Sidorenko. However, for the 
convenience of the reader we give a proof here, which is somewhat simpler than theirs and 
gives a slightly better bound. (This improvement has an imperceptible effect on our bound 
for DHJ3 though.) 

We begin with an easy and standard lemma. As usual, if X is a finite set and F is a 
subset of X, we write for |y|/|X|. 

Lemma 2.2. Let X he a finite set and let X^ he a random suhset of X , where 7 is an 
element of a prohahility space T. Suppose that E^/i(X^) = 6. Now let 7 and 7' he chosen 
independently from T. Then y/i(X^ fl Xy) > 6"^. 

Proof. Let be the characteristic function of X^. Then 

52 = (E^/i(X^))2 

= E^.E^,y^^(x)^y(x) 

= E^^Y fi(^X^ n Xy). 



The inequality above is Cauchy-Schwarz. The result follows. □ 

Theorem 2.3. Let A he a collection of suhsets of [n] that contains no d-dimensional 
comhinatorial suhspace. Then the density of A is at most {2h/nY^'^ . 

Proof. Let 5 be the density of A. (That is, A has cardinality 52".) For i = 1, 2, . . . , c? — 1 
let Ui = [n/4'^~*J and let na = n — (ui + • ■ ■ + n^-i). Note that Ud > (2/3)n. 

Let us partition [n] into sets Ji U ■ ■ ■ U Jd-i U E with \Ji\ = [?t,/4"'~*J. Note that 
\E\ > (2/3)n. 

Now consider the following way of choosing a random subset A of [n]. First we choose 
a random permutation vr of [n]. Then we choose a random integer s according to the 
binomial distribution with parameters Ui and 1/2. Next, we let i? be a random subset 
of {vr(?T,i + 1), . . . , vr(?7,)}. Finally, we let A be the set {vr(l), . . . , vr(s)} U B. The resulting 
distribution on A is uniform, as can be seen by conditioning on the set {vr(l), . . . , 7r(ni)}. 

Let us write A^^^g for the set {vr(l), . . . , 7i{s)} and X^^^ for the set of all B C {7r(ni + 
1), . . . ,7r(n)} such that A^^^g U B E A. Then the average density of X.„^s (in the set of 
all subsets of {7r(ni + 1), . . . , 7r(n)}) is 6. Therefore, by Lemma 12. 2[ if we first choose vr 
randomly and then choose s and t independently at random from the binomial distribution 
as we did for s above, then the average density of X^r^s H X^^t is at least 5^. 

We would like s and t to be distinct. The probability that s = t is equal to 2""^ (^"^) 
(since it is the same as the probability that s + t = rii), which is well known to be at most 
Hi which in turn is at most 2'^~^n~^^^. Therefore, the expected density of X,r,s H X^^^t 
conditional on s 7^ t is at least 5^ — 2'^~^n~^^^. 

Let us choose s < t such that //(X^^^ (iX^^t) > 6'^ — 2'^~^n~^/^, and let us write A^^^ and 
A^i^ for At^^s and A^r,*. Note that A^^^ is a proper subset of A^i \ that both are disjoint from 
the set {vr(ni + 1), . . . , vr(n)} and that A^^ U B and A^^^ U B both belong to A for every 

B G x^^s n X^^t. 



8 



D. H. J. POLYMATH 



Now let us run the argument again, with n replaced by ?i — rii, ni replaced by 77,2 and A 
replaced by the set Ai = X^,^ fl X^^j. It gives us sets Aq and A\ and a set A2 of subsets 
of {7r(n2 + 1)5 • • • 5 '^{n)} such that A^^^ is a proper subset of Af'\ both A^^^ and A^^ are 

('21 (2) 

disjoint from {7r(n2 + 1), . . . , vr(n)}, both U i? and Al U B belong to Ai for every 
B E A2, and the density of A2 is at least 

where for the last inequality we used the fact that 6 < 1/2. 

If we continue this process and have shown that Ar has density at least 5'^'^ — 2^~'^~^^n~^^'^ , 
then at the next stage we obtain Ar+i with density at least 

Therefore, as long as 5^'* ^ — 4r2~^/^ > l/2-\/2?773, then by Sperner's theorem Ad-i con- 
tains two sets A^'^ and aI^\ with A^'^ a proper subset of A^^K This gives us the desired 

combinatorial subspace (which consists of all sets of the form A^^^^ U • ■ ■ U A^^'^^ such that 
each is either or 1. 

The inequality we need is true if n > 5^/5^ , so the theorem is proved. □ 



3. Equal-slices measure and probabilistic DHJ 

The proof of Sperner's theorem can be regarded as follows. First, one chooses a different 
measure on the power set of [n], where to choose a set you first choose its cardinality m 
uniformly at random from {0, 1,2, ... ,n} and you then choose a random set of size m. 
The set of all subsets of [n] of size m is sometimes denoted by [n] ^"^^ and called a layer or 
slice of the cube. We therefore call the resulting probability measure on the power set of 
[n], or equivalently on [2]", the equal-slices measure. 

This measure arises so naturally in the averaging argument that we used to prove 
Sperner's theorem that it is tempting to say that the "real" theorem is that the maxi- 
mum possible equal-slices measure of an antichain is l/(n + 1). One then converts that 
into a slightly artificial (and weaker) statement about the uniform measure. 

The advantage of equal-slices measure is not just cosmetic, however: it and its obvious 
generalization to [k]"^ will play a crucial role in our proof. Rather than saying straight away 
why this should be, we shall prove a result using equal- slices measure and explain why it 
would be problematic to give a uniform version. 

But before we do that, let us give a formal definition of the equal-slices measure on [k]"'. 
This time we choose, uniformly at random from all possibilities, a k-tuple (ai, . . . , a^) of 
non-negative integers that add up to n, and then we choose a sequence x G [/c]" such that 
for each j the set Xj = {i ■ Xi = j} has cardinality aj, again uniformly from all possibilities 
(of which there are f " ) ) . 

The number of slices can be worked out by a "holes and pegs" argument: given any 
subset B = {61, ... , bk-i} of {1, 2, . . . , n -|- /c — 1} of size k — 1, let be the number of 
integers strictly between and bi, where we treat Bq as and 6^ as + k. This gives us 
all possible sequences (ai, . . . , at) exactly once each, so the number of slices is ("^^7^). 

For use in the proof of the next theorem, we note that if = 3 then the number of slices 
with 02 = is 77, + 1, so the probability that a2 = is (n-|- 1)/ ("'^^), which equals 2/(?t, + 2). 
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We can easily define equal-slices measure for combinatorial lines as well. Indeed, there 
is a one-to-one correspondence between lines in [fc]" and points in [k + 1]", at least if one 
allows the lines to be degenerate. If |/ G [k + 1]", then the corresponding line consists of 
all points of the form y^+'^^i with j G [k] ; in other words, the set of i such that yi = k + 1 
is treated as a wildcard set. 

3.1. A probabilistic version of Sperner's theorem. As mentioned in the introduction, 
our proof of DHJ uses a density-increment strategy: that is, we assume that A does not 
contain a line and deduce that A has increased density inside some subspace. In almost 
all known proofs of this kind, one can in fact get away with a weaker hypothesis. If A is 
a dense set inside which one wishes to find some structure, then one can find a density 
increment on the assumption that A has "too few" subsets of the kind one is looking for, 
or more generally "the wrong number" of such subsets, where "the right number" is the 
number you would expect if A is a random subset of density S. Similarly, it is also possible 
to find equivalent versions of the theorems that say that a set A of density 6 contains not 
just one subset of the desired kind, but "many" such subsets, where this means that if you 
choose a random such subset then with probability at least c = c{6) > it will lie in A. A 
statement like this is called a "probabilistic version" of the density theorem. 

This is a sufficiently important feature of previously known arguments that it is initially 
unsettling to observe that it is false for DHJ even when k = 2. The reason is a simple one. 
By standard measure-concentration results, almost all points in [2]" have roughly n/2 Is 
and n/2 2s. By the same results, almost all combinatorial lines have roughly n/3 fixed 
Is, n/3 fixed 2s and n/3 variable coordinates. (A precise statement expressing this can be 
found in Lemma [6.21 below.) It follows that there is a set of density almost 1 (the set of 
sequences with roughly equal numbers of Is and 2s) that contains only a tiny fraction of 
all lines (ones with roughly n/2 fixed Is, roughly n/2 fixed 2s and a very small wildcard 
set). 

However, this does not mean that there is no probabilistic version of DHJ, which is 
fortunate as we shall need one later. It merely means that the uniform measure is the wrong 
measure in which to express it. To illustrate this point, we now prove a "probabilistic" 
version of DHJ2. It tells us that an equal-slices-dense subset of [2]" must contain an 
equal-shces-dense set of lines. 

Theorem 3.1. Let A be a subset of [2]" of equal-slices density 6. Then the set of (possibly 
degenerate) combinatorial lines in A has equal-slices density at least 5^{n + l)/{n + 2). 

Proof. Let tt be a random permutation of [n] and let s and t be elements of {0, 1, 2, . . . , ra} 
chosen independently and uniformly at random. Let us write Xj^^rn for the sequence that 
takes the value 1 at 7r(l), . . . , 7r(m) and everywhere else, and let A^ be the number of the 
sequences x^^^ that belong to A. Then EA^^ = 5n, by the definition of equal-shces measure. 

From this it follows that EA^ is at least 5'^n'^. But A^ is the number of pairs (s,t) such 
that both Xt^^s and a;,r,t belong to A. Therefore, if we choose a random pair {xt^^si ^11,1} then 
the probability that both its constituent sequences belong to A is at least 5^. 

Now each such pair forms a combinatorial line. If s < t, then this line consists of all 
sequences x such that Xj = 1 if i G {vr(l), . . . , 7r(s)}, = if i G {nit + 1), . . . , vr(n)}, 
and x is constant on the set {n^s + 1), . . . , vr(t)}. (Thus, the set {rci^s + 1), . . . , vr(t)} is 
the wildcard set.) If t < s then we simply interchange the roles of s and t in the above. 



10 



D. H. J. POLYMATH 



(If t = s then we have a degenerate hne and interchanging the roles of s and t makes no 
difference.) 

There is one technical detail that we need to address, which is that the probability p{i) 
that we choose a particular combinatorial line i is not quite the equal-slices probability q{i). 
In particular, the probability that the line is degenerate is (n + instead of 2(n + 2)~^. 
However, if we condition on the event that s ^ t, then we are choosing a random subset 
of {0, 1, 2, . . . , n} of size 2, and such pairs are in one-to-one correspondence with triples 
(ai, 02, cts) such that ai + a2 + a-s = n and 02 7^ 0. Thus, p{£) = {n + 2)q{i)/2{n + 1) if £ 



is degenerate, and = (1 - (n + l)~^)g(£)/(l - 2(n + 2)-^) = {n + 2)q{i)/{n + 1) if £ is 



non-degenerate. 

From the above calculation it follows that the set of lines in A has equal-slices density 



The equal-slices density of the set of degenerate lines is 0{n ^), so this result implies 
that there is a dense set of non-degenerate combinatorial lines in A as well. 

3.2. Non-degenerate equal-slices measure. For technical reasons, it is sometimes con- 
venient, when talking about equal-slices measure, to condition on the event that every 
j e [k] is equal to Xi for some i. Indeed, we have already seen in the proof of Theorem 
13.11 that degenerate slices — that is, slices for which this condition does not hold — can be 
slightly problematic. It turns out that if we condition on the slices not being degenerate, 
then we can prove a useful lemma that would hold only approximately, and after tedious 
consideration of the degenerate cases, if we used the equal-slices measure itself. 

Let us therefore define the non-degenerate equal-slices measure on [fc]" as follows. One 
first chooses a random fc-tuple of positive (rather than non-negative) integers (oi, . . . , a^) 
that add up to n and then a random sequence x G [A;]" such that \Xj\ = aj for each j, 
where as before Xj is the set {i G [n] : Xi = j}. 

A helpful equivalent way of defining this measure is as follows. To select a random point 
X G [fc]", one places n points qi, . . . ,qn around a circle in a random order. That creates n 
gaps between consecutive points. One chooses a random set of k of these gaps and places 
further points ri, . . . ,rfc into the gaps, again in a random order. Finally, one sets Xj to 
be j if and only if rj is the first point out of ri, . . . , that you come to if you go round 
clockwise starting at q^. 

Note that since the qi are in a random order, precisely the same distribution will arise 
if the Tj are placed in some fixed order rather than their order too being randomized. 
However, it is more convenient to randomize everything. Note also that since we do not 
allow two different rj to occupy the same gap, for each j there exists i such that Xi = j. 
Finally, note that apart from this constraint, all slices are equally likely. Therefore, we 
really do have the equal-slices measure conditioned on the event that the slices are non- 
degenerate. 

To see the effect that this conditioning has, let us give an upper bound for the probability 
is that a slice is degenerate. 

Lemma 3.2. Let x be an equal-slices random point of [/c]". Then the probability that no 
coordinate of x is equal to k is J^.^i ■ In particular, it is at most k/n. 

Proof. To choose k non-negative integers ai, . . . ,0^ that add up to n, and to do so uni- 
formly from all possibilities, one can choose a random subset P = {pi < ■ ■ ■ < Pk^i} C 




□ 
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{1, 2, . . . ,n + k — 1} of size k — 1 (of "pegs") and let be the number of integers strictly 
between and pi, where we set po = and pk = n + k. The probability that no co- 
ordinate of X is equal to k is the probability that Ofc = 0, which is the probability that 
n + k ~ 1 E P, which is ^7^ , , as claimed. □ 

Corollary 3.3. Let v and v he the equal-slices and non- degenerate equal-slices measures 
on [A;]", respectively. Then for any set A C [kY we have \i'{A) — i'{A)\ < k'^ jn 

Proof. It follows from Lemma 13.21 that the probability that a slice is degenerate is at most 
k'^/n. Therefore, if A is a set that consists only of non-degenerate sequences, then its 
non-degenerate equal-slices measure is (1 — c)~^ times its equal-slices measure, for some 
c < k'^/n. Therefore, for such a set, < i^{A) — //(A) = cz/(y4) < k'^/n. If A consists only 
of degenerate sequences, then < i^iA) — z/(y4) = < k'^/n. The result follows, since if 
one takes a union of sets of the two different kinds, then the differences cancel out rather 
than reinforcing each other. □ 

For later use, we slightly generalize Lemma [3.21 

Lemma 3.4. Let x he chosen randomly from [/c]" using the equal-slices distrihution. Then 
the prohahility that fewer than m coordinates of x are equal to k is at most mk/n. 

Proof. Let P be as in the proof of Lemma [221 This time we are interested in the probability 
that Pk^i > n + k — m. The number with Pk-i = n + k — s is ' "which is at most 

("fc^^2^)) which as we noted in the proof of Lemma 13.21 is at most ^("^^7^). The result 
follows. □ 

Corollary 3.5. Let x he chosen randomly from [k]^ using the equal-slices distrihution. 
Then the prohahility that there exists j G [k] such that fewer than m coordinates of x are 
equal to j is at most mk'^/n. 



Proof. This follows immediately from Lemma [3.41 □ 

Now let us return to our discussion of the non-degenerate equal-slices measure. The 
next result tells us that it has a beautiful property. Let us use the expression //-random to 
mean "random and chosen according to the non-degenerate equal-slices measure". Then 
the property is that a //-random point in a //-random subspace with no fixed coordinates is 
a //-random point. This result will enable us to carry out clean averaging arguments when 
we are using equal-slices measure. 

We have not said what we mean by a //-random subspace with no fixed coordinates, 
but the definition is a straightforward modification of our earlier definition of the equal- 
slices density of a set of combinatorial lines. First, a d-dimensional subspace with no 
fixed coordinates is simply a subspace obtained by partitioning [n] into d non-empty sets 
Xi, . . . , Xd and taking the set of all sequences x E [kY that are constant on each Xj. For 
brevity, let us call these special subspaces. 

As we mentioned earlier, just as a combinatorial line in [kY can be associated with a 
point in [/c -|- 1]", so a d-dimensional combinatorial subspace in [kY can be associated with 
a point in [k + c?]". If the subspace is special, then it will in fact be associated with a point 
in [dY- 

In the reverse direction, if x G [k + dY-, then the corresponding d-dimensional subspace 
is the set of all points y such that yi = j whenever j G [k] and Xi = j, and y is constant on 
all sets of the form Xj = {i : Xi = j} when j > k. Thus, the wildcard sets are the d sets 
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Xk+i, ■ . . ,Xk+d- In the case of special subspaces, we take instead x to belong to [c?]" and 
the wildcard sets are Xi, . . . , X^. 

Therefore, when we talk about the equal-slices measure or non-degenerate equal-slices 
measure of a set of special (i-dimensional subspaces, we are associating with each subspace 
a point in [d]"^ and taking the corresponding measure there. (A small detail is that for this 
to work we need the wildcard sets in the combinatorial subspace to form a sequence rather 
than just a set. In other words, if we permute the "basis" then we are considering the 
result as a different subspace, even though it consists of the same points. Alternatively, 
one could regard the correspondence as being d\-to-one.) 

Lemma 3.6. Let n, k and d he positive integers with n > k + d. Suppose that a point 
X G [fc]"" is chosen randomly by first choosing a v-random special d- dimensional subspace 
V of [A;]" and then choosing a u-random point in V . Then the resulting distribution is the 
non-degenerate equal-slices measure on [k]"^. 

Proof. To prove this we use the second method of defining the non-degenerate equal-slices 
measure. That is, we choose a random subspace as follows. First, we place n points 
gi, . . . , gri in a random order around a circle. Next, we choose d points ri, . . . , and place 
them in random gaps between the g^, with no two of the Vh occupying the same gap. Then 
the wildcard set Xh will consist of all h such that Vh is the first of the points ri, . . . , if 
you go clockwise round the circle from gj. Let us call the set of points g^ with this property, 
together with Vh, the hth block. 

How do we then choose a random point x in this subspace? We can think of it as 
follows. We take the d blocks and randomly permute them. We then randomly place k 
points Si, . . . , Sfc in gaps between blocks (with no two Sj in the same gap). Then Xi = j 
if Sj is the first of the points si, . . . ,Sk if you go clockwise round from g^ (after the blocks 
have been permuted). 

Now consider a second way of choosing a random point in [/c]". We proceed exactly as 
above, except that this time we do not bother to permute the blocks. We claim that this 
gives rise to exactly the same distribution. 

To see this, let us call two valid arrangements of the points gi, . . . , g„ and ri, . . . , 
equivalent if one is obtained from the other by a permutation of the blocks. Then all 
the equivalence classes have size d\, so randomly choosing an arrangement is the same 
as randomly choosing an arrangement and then randomly changing it to an equivalent 
arrangement. 

Now the second way of choosing a random sequence amounts to choosing the random 
points gi, . . . , gn and ri, . . . , r^, randomly choosing k of the points ri, . . . , and calling 
them si, . . . ,Sk (in a random order) and finally using the points gi, . . . , g„, Si, . . . , to 
define a point in [k]"" in the usual way. But this is precisely the non-degenerate equal-slices 
measure on [fc]". □ 

3.3. A probabilistic version of the density Hales-Jew^ett theorem. With the help 
of Corollary 13.31 and Lemma 13. 6[ it is straightforward to prove that a probabilistic version 
of DHJfc follows from an "equal-slices version". Let us begin by stating the equal-slices 
version. 

Theorem 3.7. For every S > and every positive integer k there exists n such that every 
set A C [A;]" of equal-slices density at least S contains a combinatorial line. 
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We shall show later that Theorem 13 . 71 follows from DHJ^ itself. For now let us assume it 
and deduce a probabilistic version. We shall write EDHJ(/c, S) for the smallest integer m 
such that every subset Ac [k]"^ of equal-slices density at least 6 contains a combinatorial 
line. 

Theorem 3.8. Let 6 > and let k be an integer greater than or equal to 2. Then there 
exists 9 =PDHJ{k, 6) such that for every n > max{m, Ak'^/S} and every A C [k]^ of equal- 
slices density at least S the equal-slices density of the set of combinatorial lines in A is at 
least 9. Moreover, if m =EDHJ{k,6/4) then we can take 9 = {6/9){k + 

Proof. (Assuming Theorem 13.71 ) By Corollary 13.31 the non-degenerate equal-slices density 
i^{A) of A is at least S — k'^/n. Since n > 4k^/6, this is at least 35/4. 

Let V he a random m-dimensional special subspace of [/c]", chosen according to the 
non-degenerate equal-slices measure. Then Lemma 13.61 implies that the expected non- 
degenerate equal-slices density of A inside V is also at least 35/4, from which it follows 
that with probability at least 6/4 this density is at least 6/2. 

Let V he a subspace inside which A has non-degenerate equal-slices density at least 6/2. 
Remove from A fl ^ all degenerate strings. The resulting set A' (IV still has density at 
least 6/2. By Corollary 13.31 again, this implies that the equal-slices density of A' inside V 
is at least 6/4. 

But by our choice of m this means that with probability at least 6/4 the set A' (IV 
contains a combinatorial line. Moreover, since A' (iV contains no degenerate strings, this 
line must have fixed coordinates of every single value. 

The number of such lines is at most {k -\- 1)™. Therefore, if you choose a random 
special subspace and inside it you choose a line according to the non-degenerate equal- 
slices measure, then with probability at least {6/4){k + l)""* it will be a line in A. 

But by Lemma 13.61 the way we have just chosen this line was according to the non- 
degenerate equal-slices measure. By the proof of Corollary 13. 3^ the equal-slices probability 
is at least {6/4){k + 1)~™(1 — {k -\- l)^/ra). By our assumption that n > 4k'^/6 (and that 
k>2), this is at least (5/9) (A; + 1)"™. □ 

4. A MODIFICATION OF AN ARGUMENT OF AjTAI AND SZEMEREDI 

After Szemeredi proved his theorem on arithmetic progressions, it was natural to try to 
prove the multidimensional version, which states that for every finite subset H of Z'^ and 
every 5 > there exists N such that every subset A of [N^ of size at least ^A^'^ contains 
a subset of the form aH + b with a > 0. A full proof of this result had to wait for the 
ergodic approach of Furstenberg: the result is due to Furstenberg and Katznelson |FK78] . 
However, Ajtai and Szemeredi managed to prove the first genuinely multidimensional case 
of the theorem, where H is the set {(0,0), (1,0), (0, 1)}, by means of a clever deduction 
from Szemeredi's theorem itself. Their argument is based on a density-increment strategy, 
but it is not organized in quite the way that was described in §1.41 However, it is possible 
to reorganize the steps so that it follows that general outline very closely: in this section 
we briefly sketch this slight modification of their argument because it provides a template 
for our proof of the density Hales- Jewett theorem. 

Let 5 > 0, let A^ be a large integer, and let A be a subset of [A^]^ of density at least 6. 
Our aim is to show that A contains a triple of the form {(x, y),{x + d, y), (x, y + d)} with 
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d > 0. We shall call such configurations corners. The theorem of Ajtai and Szemeredi is 
the following. 

Theorem 4.1. For every S > there exists N such that every subset A C [A^]^ of density 
at least 6 contains a triple {(x, y),{x + d, y), (x, y + d)} with d > 0. 

Before we sketch the proof, we make the general remark that there are three privileged 
directions, horizontal, vertical and parallel to the line x + y = 0, which correspond to the 
three lines that are defined by pairs of points from the set {(0,0), (1,0), (0, 1)}. Indeed, 
one could argue that the formulation of the problem is an unnatural one, and that instead 
of the grid [A^]^ one should consider a triangular portion of a triangular lattice, so that 
there is a symmetry between the three directions. We shall not do this, but when we come 
to relate the argument of this section to the proof of DHJ, it will help to bear this point 
in mind. 

We shall regard certain subsets of [A^]^ as "simple" or "somewhat structured". We define 
a i-set to be a subset of the form X x [A^]. We call such sets 1-sets because whether or not a 
point (x, y) belongs to X x [A^] depends only on its first coordinate x. A more symmetrical, 
and therefore preferable, explanation is this. We represent our points not by pairs (x, y) 
with X, y G [A^] but as triples (x, y, z) such that x, y G [A^] and x + y + z = 2N + 1. (We 
have chosen 2N so that z lies between 1 and 2N — 1, but all we care about is that x + y + z 
should be constant.) It is still true that whether or not the point represented by a triple 
{x,y,z) belongs to X x [A^] depends only on x. In other words, if {x,y,z) belongs to a 
1-set, then so does {x,y + u, z — u) for every u. Another way of looking at this, which turns 
out to correspond more closely to what we shall do when we prove DHJ, is think of a 1-set 
as a 23-insensitive set, meaning that membership of the set is unaffected by changes to 
the second and third coordinates. 

Another special kind of set is one of the form X xY . This is the intersection of the 1-set 
X X [A^] and the 2-set [A^] x Y . In this section we shall call it a 12-set (which is not to be 
confused with a 12-insensitive set, which we are calling a 3-set). 

Now let us sketch the argument that gives us corners. The basic idea is a density 
increment strategy, which has been used to prove many density theorems. (A few examples 
can be found in |Rot53] . |Sze75] . [GowOlj . |Shk06b] . |Shk06aj . and |LM08] . but this is by 
no means an exhaustive list.) We shall show that if A does not contain a corner, then 
there is some subset of [A^]^ that looks like [m]^, and inside that subset A has an increased 
density. We can iterate this argument until eventually we reach a contradiction when the 
relative density of A inside some subset becomes greater than 1. 

4.1. Finding a dense diagonal. The first step is to find a set of the form {(x, ?/) : x + y = 
t} that contains a reasonable number of points of A. Since there are 2A^ — 1 such sets and 
A has size at least 6N^, at least one such set contains at least 6N/2 points of A. 

4.2. A dense 12-set that is disjoint from A. Suppose that we have found t such that 
the number of points of A in the diagonal {{x,y) : x + y = t} is at least 6N/2. Let us 
write these points as (xi, yi), . . . , {x2m, 2/2m) with xi < ■ ■ ■ < X2m- If the number of points 
of A on the diagonal is odd, we just omit one of them. Let X = {xi, . . . ,Xm} and let 
y = {Um+i, ■ ■ ■ , y2m}- Then no point of X x y can belong to A, since if (xj, yj) & A then 
the three points {xi,yj), {xj,yj) and {xi,yi) all belong to A, and they form a corner since 
Xj — Xi = yi — yj > 0. The size of X x F is m^, and m > [5X/4J , so (ignoring the integer 
part) X X Y has density at least 5^/16 or so. 
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4.3. A dense 12-set that correlates with A. If A is disjoint from a dense 12-set X xY 
then it must make up for this with an increased density in the complement of X x F. 
However, the complement of X x Y splits up into the three 12-sets X x Y'^, X'^ x Y and 
X'^ X Y'^. A simple averaging argument shows that in at least one of these three 12-sets 
the relative density of A is at least 6 + 5^/48. Thus, we have sets U and V such that the 
density of A inside the 12-set t/ x is at least 6 + 6^/48. Moreover, a very crude argument 
shows that the U x V must have density at least 5^/48 inside [N]"^. 

4.4. A dense 1-set can be almost entirely partitioned into large grids. As men- 
tioned earlier, our eventual aim is to find a subset of [N]"^ of a similar type, inside which 
A has increased density. The subsets that will interest us are grids, which are sets of the 
form P X Q, where P is an arithmetic progression and Q is a translate of P. 

Given a dense 1-set X x [N], we can partition almost all of it into grids as follows. 
Suppose that the density of X is ^ and let e be some positive constant that is much smaller 
than 9 (but independent of N). Since X has density at least e, by Szemeredi's theorem it 
contains an arithmetic progression Pi of length at least m, where m tends to infinity with 
N. If the set X \Pi still has density at least e, then it contains an arithmetic progression 
of length m. Indeed, we can partition X into sets Pq, Pi, . . . , Pr, where Pi, . . . ,Pr are 
arithmetic progressions of length at least m and Pq is a residual set of density less than e. 

For each i, we can then straightforwardly partition almost all of Pi x [N] into sets of 
the form Pi x Qij, where each Qij is a translate of Pi. (It helps if each Pi has diameter at 
most eN, but it is easy to ensure that this is the case.) We can therefore partition all but 
an arbitrarily small proportion of X x [N] into grids of size tending to infinity with N. 

4.5. A dense 12-set can be almost entirely partitioned into large grids. It is easy 

to deduce from the previous step a similar statement about 12-sets. Indeed, let X and Y 
be dense sets, and begin by partitioning almost all of X x [N] into large grids PiXQi. (We 
have changed the indexing of these grids.) The intersection of X x F with any of these 
grids Pi X Qi is Pi x (FflQi), since C X. Therefore, iiY P\Qi has positive density inside 
Qi, we can use the previous step to partition almost all of Pi x {Y flQi) into subgrids, still 
with size tending to infinity. By a simple averaging argument, the proportion of points in 
X xY that are contained in grids Pi x Qi inside which Y is sparse is small. So by this 
means we have partitioned almost all of X x y into grids with sizes that tend to infinity. 

4.6. A density increment on a large grid. By Step 3, we have a dense 12-set X xY 
inside which the density of A is at least 5 + 5^/48. By Step 5 we can partition almost all 
of X X y into large grids. If we choose "almost" appropriately, we can ensure that the 
density of that part of A that lies in these large grids is at least 5-1-5^/100. But then by 
averaging we can find a large grid P xQ such that the density of A inside P x Q is at least 
5 + 5^/100. This is exactly what we need for our density-increment strategy, so the proof 
is complete. 

5. A DETAILED SKETCH OF A PROOF OF DHJ3 

In this section, we shall explain in some detail how our proof works in the case k = ?>. 
As mentioned in the previous section, the structure of our proof is closely modelled on 
the structure of the argument of Ajtai and Szemeredi (in the slightly modified form in 
which we have presented it). However, to make that clear, we need to explain what the 
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counterparts are of concepts such as "grid", "12-set" and the hke. So let us begin by 
discussing a dictionary that will guide us in our proof. 

Everything flows from the following simple thought: whereas a typical point in [A^]^ can 
be thought of as a triple (x, y, z) such that x + y + z = 2N + 1, a typical point in [3]" can 
be thought of as a triple of disjoint sets (X, Y, Z) such that X UY U Z = [n]: to turn such 
a triple into a sequence (xi, . . . , x„) let Xj = 1 if i G X, 2 if i G F and 3 if i G 

A corner in [N]'^ can be defined symmetrically as a triple of points of the form {(x + 
u, y, z), {x,y + u, z), (x, y,z + u)} such that x + y + z + u = 27V + 1 and m 7^ 0. This translates 
very nicely: a combinatorial line is a triple of points of the form {{X U U, Y, Z), {X, Y U 
U, Z), {X, Y,ZUU)} such that X, Y, Z and U partition [n] and f/ 7^ 0. 

A diagonal in [N]"^ is a set of the form Dt = {{x,y, z) : x + y = t}. It therefore makes 
sense to define a "diagonal" in [3]" to be a set of the form {{X, Y, Z) : XUY = T} for some 
subset T C [n]. In other words, it is the collection of all triples (X, Y, Z) that partition 
[n], but now Z is a fixed set (equal to the complement of T above). 

Recall that a 1-set in [X]^ is a set of the form X x [X], or in symmetric notation a set 
of the form {(x, y, z) : x G X}. The obvious generalization of this notion to [3]" is a set of 
the form {(X, Y, Z) : X E X} for some collection X of subsets of [n]. A subset S of [3]" 
is a 1-set if and only if it is 23-insensitive in the following sense: if (X, Y, Z) G S, then 
(X, Y', Z') G S whenever Y' U Z' = Y U Z . Equivalently, if a sequence x G [3]" belongs to 
S, then so do all sequences that can be formed from x by changing some 2s to 3s and/or 
some 3s to 2s. 

The natural definition of a 12-set is now clear: as in the case of subsets of [X]^, it should 
be the intersection of a 1-set with a 2-set. 

We should also mention that the notion of Cartesian product has an analogue. The 
Cartesian product of X and Y is the intersection of the 1-set X x [X] with the 2-set 
[X] X Y. So if we are given two collections X and 3^ of subsets of [n], then the analogue 
of their Cartesian product ought to be the 12-set {{X,Y, Z) : X e X ,Y e y, X nY = 0}. 
Since X and Y determine Z, we can think of this as a set of pairs, and then the resemblance 
with a true Cartesian product is that much closer: it is (equivalent to) the set of all pairs 
(X, Y) such that X E X , Y E y and X and Y are disjoint. We shall call this the disjoint 
product of X and y and write it as A' Kl y. 

There is one concept that has a non-obvious (though still natural) translation from the 
[X]^ world to the [3]*^ world, namely that of a grid. At first sight, it might seem extremely 
unlikely that the Ajtai-Szemeredi can be generalized to give a proof of DHJ3. After all, 
their proof could be regarded as the beginnings of a sort of induction: they deduce the 
first non-trivial case of the two-dimensional theorem from the full one-dimensional theorem 
(namely Szemeredi's theorem). If one is attempting to prove DHJ3, the obvious candidate 
for a statement "one level down" is DHJ2, but that is a much less deep statement than 
Szemeredi's theorem. So it seems that our only hope will be if Ajtai and Szemeredi did 
not after all need a tool as powerful as Szemeredi's theorem. 

One of the key ideas of our proof is that this is indeed the case, though the result 
we need is not DHJ2 but its multidimensional version MDHJ2 proved in the last section. 
The appropriate replacement of the notion of a long arithmetic progression in [X] is a 
combinatorial subspace of [2]". We then have to decide what the analogue of a grid is. 
Given the concepts so far, it should be something like the disjoint product of two "parallel" 
combinatorial subspaces of [2]", and we would like that to give us a combinatorial subspace 
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of [3]" (since we want the analogue of a grid to be a structure that resembles [3]"). All 
this can be done. A dimensional combinatorial subspace of [2]" is defined by taking 
disjoint sets Xq, Xi, . . . , Xd and defining U to be the set of all unions Xo U Ujga -^i such 
that A G [d]. It is natural to define two such subspaces to be parallel if they are defined by 
sequences of sets {Xq, Xi, . . . , Xa) and {Yq, Yi, . . . , Yd) such that Xi = Yi for every i > 1, 
and also, since we want to take a disjoint product, to add the condition that Xq and Yq 
should be disjoint. If we do that, then a typical point in the disjoint product is a pair 
(X, Y) such that X = Xq U IJ-^^ Xi and Y = YoU ij.^^ Xi such that A fl 5 = 0. If we set 
Z = [n]\(Xuy), we see easily that this is precisely a d-dimensional combinatorial subspace 
of [3]"': Xq and Yq are the sets where the fixed coordinates are 1 and 2, respectively, and 
the wildcard sets are Xi, . . . , X^. 

With these concepts in mind, let us now give an overview of the proof of DHJ3. (To 
generalize this discussion to DHJ^ is straightforward: the Ajtai-Szemeredi argument can be 
used to deduce a "/c-dimensional corners" theorem from the {k — l)-dimensional Szemeredi 
theorem, and that provides a template for our deduction of DHJ^+i from MDHJ^, which 
itself can be deduced from PDHJ^, which follows from DHJ^.) 

5.1. Finding a dense diagonal. Recall that we are defining a diagonal in [3]" to be a 
set of the form {(X, Y, Z) : X U Y = T}. Equivalently, one fixes a set Z and defines the 
associated diagonal to be the set of all sequences in [3]" that take the value 3 in Z and 1 
or 2 everywhere else. 

Obviously the diagonals form a partition of [3]*^, so if A C [3]" is a set of density 6 > 
then by averaging we can find a diagonal inside which A still has density 6. We can also 
ensure that this diagonal is not too small by throwing away the very small fraction of [3]" 
that is contained in small diagonals. 

It is not completely obvious at this stage what probability measure we want to take on 
[3]", but note that the argument so far is general enough to apply to any measure. 

5.2. A dense 12-set that is disjoint from A. What should we do next? In the equiv- 
alent stage of the corners argument we were assuming that A contained no corners. Then 
every pair of points of A in our dense diagonal implied that a third point (the bottom of 
the corner of which those two points formed the diagonal) did not belong to A. Moreover, 
the set of points that we showed did not belong to A formed a dense 12-set. So now we 
would like to do something similar. 

At first, the situation looks very promising, since if (X, Y, Z) and (X', Y', Z) are two 
points with X C X', both belonging to the diagonal determined by the set Z, then we 
can set U = X' \X and write these two points as (X, Y UU, Z) and (X U U, Y, Z). Then 
the point (X, Y, Z U U) cannot lie in A, since otherwise the three points would form a 
combinatorial line in A. 

So what can we say about the set of all forbidden points? These are all points of the 
form (X, Y,Z UU) such that both (X U U, Y, Z) and (X, Y U U, Z) belong to A. Now 
Z is a fixed set (that defines the particular diagonal we are talking about), so if we are 
presented with a point (X, Y, Z U U) then we can work out what U is. Let X be the set 
of all X c[n]\Z such that (X, [n] \{X U Z),Z) e A. Then the set of all (X, Y, Z U U) 
such that (X, Y U U, Z) e A is precisely the set of all (X, Y, Z U U) such that X e X. 
This would be a 1-set if we were not insisting that every point took the value 3 in the set 
Z. However, the set of all such points forms a subspace of [3]" (of dimension n — \Z\), 
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and inside that subspace we have a 1-set. Similarly, the set of all {X, Y, Z U U) such that 
{X U U,Y, Z) G A is a 2-set inside the same subspace: this time we define y to be the set 
of all Y such that {[n] \{Y U Z),Y, Z) e A and take the set of all points {X, Y,ZUU) such 
that Y ey. 

Thus, the good news is that we have found a 12-set that is disjoint from A, but the bad 
news is that this 12-set is in a subspace of [3]" rather than in the whole space. 

5.3. A dense 12-set that correlates with A. In the proof of the result about corners, 
we used a simple averaging argument at this stage: if there is a dense 12-set that is disjoint 
from A then one of three other 12-sets must have an unexpectedly large intersection with 
A. However, we cannot argue as straightforwardly here, since the 12-set we have found is 
not dense. 

There are in fact two problems here. The first is the obvious one that we have restricted 
to a subspace, the density of which will be very small. To see this, note that for almost all 
points {X, Y, Z) in [3]" the sets X, Y and Z have size very close to ri/3. Therefore, it may 
well be that A consists solely of such points, in which case when we pass to the subspace 
that takes the value 3 on some fixed Z we will lose approximately n/3 dimensions. 

The second problem is that even when we do restrict to such a subspace we find that A 
may well have tiny density, since almost all triples in such a subspace will be of the form 
{X, Y, ZUU) with X, Y and U all of approximately the same size, and it may well be that 
no such triples belong to A, since then X, Y and Z UU do not all have approximately the 
same size. 

To get round these problems, we do two things. First, we do not use the uniform measure 
on [3]" but instead the equal-slices measure. This deals with the second problem, since for 
an equal-slices random triple {X, Y, Z) it is no longer the case that the sets X, Y and Z 
almost always have approximately the same size. Second, we argue that we may assume 
that the restriction of A to almost all subspaces has density at least 5 — t] for some very 
small 7]. This observation is standard in proofs of density theorems: roughly speaking, if 
A often has smaller density than this, then somewhere it must have substantially larger 
density (by averaging), and then we have completed the iteration step in a particularly 
simple way. But if A almost always has density at least 6 — t], then when we use an 
averaging argument to find a diagonal that contains many points of A, we can also ask for 
A to have density at least S — t] inside the subspace we are forced to drop down to. 

Once all these arguments have been made precise, the conclusion is that there is a 
subspace V of [3]*^ of reasonably large dimension such that the density of A inside V is 
at least S — rj, and a dense 12-set inside that subspace that is disjoint from A. Then a 
simple averaging argument similar to the one in the corners proof gives us a dense 12-set 
in that subspace inside which the relative density of A is at least 6 + c{6). (For this we 
must make sure we choose rj sufficiently small for the small density decrease to be more 
than compensated for by the subsequent density increase.) 

Thus, although the statement and proof of this step are directly modelled on the corre- 
sponding step for the corners proof, there are some important differences: we show that 
A correlates locally (that is, in some subspace of density that tends to zero) with a 12- 
set, whereas in the corners proof a global correlation is found. We do not know whether 
a dense subset of [3]" that contains no combinatorial line must correlate globally with a 
12-set. (Strictly speaking, we do know, since we have proved that every dense subset of 
[3]" contains a combinatorial line. However, one can obtain a better formulation of the 
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question by replacing the assumption that the set contains no hues by the assumption that 
it contains few hues.) A second difference is that although we start with a set A that is 
equal-slices dense, the local correlation that the proof ends up giving is with respect to the 
uniform measure. (There is a general principle operating here, which is that equal-slices 
measure does not behave well when you restrict to combinatorial subspaces.) 

5.4. A dense 1-set can be almost entirely partitioned into large combinatorial 
subspaces. Bearing in mind our dictionary, the next stage of the proof should be to 
partition almost all of a dense 1-set into combinatorial subspaces of dimension tending to 
infinity. 

Let us recall what a 1-set, or a 23-insensitive set, is. It is a set A C [3]" with the 
property that if x E A, y E [S]" and {i : Xi = 1} = {i : i/i = 1}, then y E A. Equivalently, 
using set-theoretic notation, it is a set of triples of the form {{X, Y, Z) : X E X} for some 
collection X of subsets of [n]. 

At this stage of the corners proof, one starts with a 1-set X x [A^], applies Szemeredi's 
theorem over and over again to remove arithmetic progressions Pi from X until it is no 
longer dense, and then partitions the sets Pi x [A^] into sets of the form Pi x Qij, where 
the Qij are translates of Pi. 

If we follow the proof of the corners theorem, then we should expect an argument along 
the following lines. We start with the 1-set {{X,Y,Z) : X G X}. We then partition 
almost all of X, which can be thought of as a subset of [2]", into large combinatorial 
subspaces using repeated applications of MDHJ2. For each one of these subspaces U, we 
then partition the disjoint product U Kl [3]" into combinatorial subspaces. 

Unfortunately, this last step does not work, which leads us to the second point where 
our argument is more complicated than that of Ajtai and Szemeredi, and the second place 
where we use localization to get us out of trouble. The difficulty is this. If U is the d- 
dimensional subspace defined by the sets {Xo,Xi, . . . ,Xd), then U Kl [3]" consists of all 
triples {X, Y, Z) of disjoint sets such that X is a union of Xq with some of the sets Xi. A 
combinatorial subspace inside this set must have wildcard sets that are unions of the Xi 
with i > 1, which means that it cannot contain any point {X, Y, Z) such that Y (1 Xi and 
Z n Xi are non-empty for every i. 

This is a genuine difficulty, but we can get round it. The way we do so may at first look 
a little dangerous, but it turns out to work. The argument proceeds in five steps as follows. 

• Let -B be a 23-insensitive set of density 77. Let m be a positive integer to be chosen 
later (for now it is sufficient to think of it as a number that tends to infinity but 
is much much smaller than n), and choose a random element of [3]" by randomly 
permuting the ground set [n] and then taking a pair {x,y), where x is chosen 
uniformly from [2]"^ and y is chosen uniformly from [3]"""^. (Here we are regarding 
X as supported on the first m elements of the randomly permuted ground set and y 
as supported on the last n — m elements.) For sufficiently small m, the distribution 
of {x,y) is approximately uniform, so if for each y we let Ey = {x : {x,y) G B}, 
then Ey has density at least rj/S in [2]"^ for a set of y of density at least rj/S. (This 
is not the main reason that we need m to be small, so this step will be true with a 
great deal of room to spare.) 

• For each such y use MDHJ2 to find a rf-dimensional combinatorial subspace U of 
[2]"^ that lives inside Ey, and hence has the property that {x,y) G B for every 
X E U. (Here, d depends on m and rj.) 
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• By the pigeonhole principle, we can find a subset T of [3]"~™ of density 9 = 
6{m,d,ri) and a combinatorial subspace U C [2]"^ such that U x T G B. Let 
us choose T to be maximal: that is, T is the set of all y G [3]"~™ such that 
U X {y} C B. Since i? is a 23-insensitive set, it follows that if we allow the wild- 
card sets of U to take the value 3 as well, then all the resulting points will still 
belong to B. That is, we have the same statement as above but now t/ is a combi- 
natorial subspace of [3]™. This is the point of our argument "where the induction 
happens" . 

• t/ X T is a union of combinatorial subspaces, and there are quite a lot of them. It is 
tempting at this stage to remove them from B and start again. But unfortunately 
there is no reason to suppose that B \{U x T) will be 23-insensitive. (We give an 
example to illustrate this just after this proof outline.) However, this turns out not 
to be too serious a problem, because for every x E X the set {B\{U x T)) fl {{x} x 
[3]"""^) is a 23-insensitive subset of {x} x [3]"~"\ In other words, we can partition 
B\{U X T) into locally 23-insensitive sets and run the argument again. 

• Using this basic idea, we develop an iterative proof. Whenever we are faced with 
a set of small density we regard it as part of our "error set" and leave it alone. 
And from any set of large density we remove a disjoint union of combinatorial 
subspaces and partition the rest into locally 23-insensitive sets. If we are careful, 
we can choose m in such a way that the combinatorial subspaces have dimension 
that tends to infinity with n, but the number of iterations before there are no dense 
sets left is smaller than n/m, so we never "run out of dimensions". In this way 
we prove that a 23-insensitive set can almost all be partitioned into combinatorial 
subspaces. 

Here, as promised, is an example of a 23-insensitive set B such that removing U x T 
leaves us with a set that is no longer 23-insensitive. Let m = 2 and n = 3 and let B be 
the 23-insensitive set {11,22,23,32,33} x {2,3}. Then B contains the set {11,22,33} x 
{1,2,3}, which is of the form U xT with U a subspace and T 23-insensitive (and it is 
the only non-empty subset of this form). If we remove this from S, we end up with the 
set {23,32} x {2,3}, which is no longer 23-insensitive. It is, however, 23-insensitive in the 
third coordinate. 

5.5. A dense 12-set can be almost entirely partitioned into large combinatorial 
subspaces. This stage of the argument is very similar to the corresponding stage of the 
corners argument and needs little comment. One simply checks that the intersection of a 13- 
insensitive set with a combinatorial subspace is 13-insensitive inside that subspace (which 
is almost trivial). Then, given an intersection of a 23-insensitive set and a 13-insensitive 
set, one applies the result of the previous section to the 23-insensitive set, partitioning 
almost all of it into subspaces, and then applies the same argument to the 13-insensitive 
set inside each subspace. 

5.6. A density increment on a large combinatorial subspace. Again, this stage of 
the argument is very similar to the corresponding stage of the corners argument. If A has 
increased density on a (locally) 23-insensitive set, and if that set can be almost entirely 
partitioned into combinatorial subspaces of dimension tending to infinity, then by averaging 
we must be able to find one of these combinatorial subspaces inside which A has increased 
density. 
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We are not quite in a position to iterate at this point, because we started out with a 
set of equal-shces measure 6 and ended up finding a combinatorial subspace on which the 
uniform density had gone up. However, it turns out to be quite easy to pass from that to 
a further subspace inside which A has an equal-shces density increment, at which point we 
are done. 

6. Measure for measure 

As we have already mentioned, there are some arguments that work better when we 
use product measures, and others when we use equal-slices measures. This appears to be 
an unavoidable situation, so we need a few results that will tell us that if we can prove a 
statement in terms of one measure then we can deduce a statement in terms of another. In 
this section, we shall collect together a number of such results, so that later on in the paper 
we can simply apply them when the need arises. The results we prove are just technical 
calculations, so the reader may prefer to take them on trust. The statements we shall need 
later are Corollary 16. 4[ Corollary 16.51 and Lemma [6.61 

We begin with a standard definition that will tell us when we regard two probability 
measures as being close. 

Definition. Let fi and v he two probability measures on a finite set X . The total variation 
distance d{fi, v) is defined to be max^^x |/^(^) ~ 

In order to prove that we can switch from one probability measure to another, we shall 
make use of the following very simple general principle. 

Lemma 6.1. Let fi and ui, . . . ,1/^ be probability measures, let ai, . . . ,am be positive real 
numbers that add up to 1, and suppose that d{fi, YlT^i (^i^i) ^ V- Then for every a G [0, 1] 
and every set A such that fi{A) > a there exists i such that Vi{A) > a — rj. 

Proof. From our assumptions it follows that YlT=i ^i^^i^) — ~ Vi so by averaging it 
follows that there exists i such that Vi{A) > a — rj. □ 

6.L From uniform measure to equal-slices measure. Before we apply Lemma (6. 11 
let us prove a simple but useful technical lemma. 

Lemma 6.2. Let x be an element of[kY chosen uniformly at random, and for each j G [k] 
let Xj = {i : Xi = j}. Then with probability at least 1 — 2/cexp(— 2n^/'^) the sets Xj all have 
size between n/k — r?l^ and n/k + n?^^ . 

Proof. The size of Xj is binomial with parameters n and 1/k. Standard bounds for the 
tail of the binomial distribution therefore tell us that the probability that differs from 
n/k by at least r is at most 2 exp(— 2r^/n). (This particular bound follows from Azuma's 
inequality.) The result follows. □ 

As a first application of Lemma 16.11 we shall prove that a set of uniform density 5 has 
equal-slices density almost as great on some combinatorial subspace. The actual result we 
shall prove is, however, slightly more general. To set it up, we shall need a little notation. 

Let m < n, let a be an injection from [m] to [n], let J = cr([m]) and let J be the 
complement of J. Then we can write each element of [A;]" as a pair [x, y) with x G [kY and 
y G [kY . An element of [kY is a function from J to [k]. Given an element x = (xi, . . . , Xm) 
of [fc]™", let 0o-(x) be the element of [kY that takes j G J to Xo-i{j). In other words, (p^ takes 
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an element of [/c]™ and uses a to turn it into an element of [kY in the obvious way. Given 
y G [ky , we also define a map (p^j^y : [fc]™ — )■ [/c]" by taking (t>a,y{x) to be {(f)„{x),y). Thus, 
(f)a-^y is a bijection between [/c]™ and the combinatorial subspace Sj^y = {{x,y) : a; G [A;]"^} 
(in which the wildcard sets are all singletons {i} such that i E J). 

Now let J/ be a probability measure on [k]^. For each pair (a, y) as above, we can define 
a probability measure i^o-^^^ on [/c]" by "copying" i/ in the obvious way. That is, given a 
subset A C [kf we let = v{(t)~l{A)). 

We now show that if m is sufficiently small, then the average of all the measures v^^y is 
close to the uniform measure on [/c]". 

Lemma 6.3. Let r] > 0, let k > 2 be a positive integer, let n > {IQk/rjy^, let m < n^/^ , 
let V he a probability measure on [k]"^ and let n be the uniform measure on [k]". Then 
d{fi,'E^^yiy(j^y) < rj, where the average is over all pairs {(T,y) as defined above. 

Proof. We shall prove the result in the case where all of ly is concentrated at a single point. 
Since all other probability measures are convex combinations of these "delta measures" 
(and their copies are the same convex combinations of the copies of the delta measures), 
the result will follow. 

Let u, then, be an element of [fc]"* and for each C C [k]'^ let z^(C) = 1 if u G C 
and otherwise. For each injection a : [m] — )■ [n] and each y E [kY (where J is again 
the complement of (T([m])), the measure u^^y is the delta measure at (f)a,y{u). That is, 
Ua,y{A) = 1 if (pfj^yiu) G A and i'a,y{A) = otherwise. 

What, then, is Kcr^yi'a,y{A)1 To answer this, let us see what happens when A is a singleton 
{z}. Then Ucr^yi^A) = 1 if and only if the restriction of 2; to J is (j)a{u) and the restriction 
of 2; to J is y. So ¥.„^yPa^y{A) is the probability, for a randomly chosen pair (a,?/), that 
Za{i) = Ui for every i G [m] and the restriction of 2 to J is y. 

For every o", the probability of the second event given a is /c"^~", so it remains to calculate 
the probability that Zo-(j) = Ui for every i. For each j G [/c], let Xj = {i : Zi = j} and let rij 
be the cardinality of Xj. Now let us choose the values o"(l), o"(2), . . . , a{m) one at a time 
and estimate the conditional probability that a{i) G Xu^ given that crih) G Xu^ for every 
h < i. If we set p = min^ rij and q = maxjUj, then each conditional probability of this 
kind will be at most q/{n — m) and at least [p — m) /{n — m). 

Lemma 16^ tells us that with probability at least 1 — 2A;exp(— 2n^/'^) we have the bounds 
n/k — n?/^ < p and q < n/k + r?!"^ . If those bounds hold, then the probability that 
ail) G Xui for every i G [m] lies between {Xjk - In-^l^y and (1/fc + ^rT^l^y . (H ere we 
are using the inequality that [njk + n'^^^)/{n — n^/^) < 1/fc + 2n~^^^, which holds ii k>2 
and n > 8.) Therefore, it lies between k~"^{l —rj/A) and k~"^[l + 77/4). (This inequality is 
valid if n > (IG/c/?])^^, as we are assuming.) 

We have just shown that the value of the measure 'K„^yi'a,y on a singleton {z} is approx- 
imately equal to the value taken by the uniform measure, provided that the singleton has 
roughly the same number of coordinates of each value. 

Let B be the set of all "balanced" sequences z. That is, B is the set of z such that the as- 
sumptions of the above argument are satisfied. ThenE^^yU^^y{B) > {l-2kexp{-2n^/^){l- 
v/^) ^ l~''7/2, from which it follows that K„^yi'„^y{B'^) < 77/2. Therefore, if A is any subset 
of [kY, we have that 

K,yl^a,y{^) < /i(^)(l + ^7/4) + r//2 
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and 

>/i(^)(l-V4)-V2. 
Since < 1, it follows that \fi{A) — K^^yU^^y^A)] < r]. 

As commented at the beginning of the proof, the result for arbitrary u follows from 
this result, since we can write it as a convex combination of delta measures and apply the 
triangle inequality. □ 

Armed with this result, we now prove two statements that will be helpful to us later on. 

Corollary 6.4. Let A be a subset of [/c]'" of uniform density 5, let r] > 0, let m < n^^^ 
and suppose that n > {IQk/rjy^. Let J be a random subset of [n] of size m and let y be a 
random element of[kY. Then the expected equal-slices density of A inside the combinatorial 
subspace Sj^y is at least 6 — 7]. In particular, there exist J and y such that the equal-slices 
density of A inside Sj^y is at least 6 — rj. 

Proof. Let z/ be the equal-slices measure on [/c]™ and apply Lemma 16.31 It implies that 
^cr,yT^cr,y{A) > S — 7], from whlch it follows that there exists a pair [a, y) such that Urj^ylA) > 
5 — rj. But Vcr^y is the equal-slices measure on the combinatorial subspace Sj^y, where 
J = cr([m]), which is m- dimensional. □ 

For the next lemma we need some notation. Given a subset J C [n] of size m and a 
sequence y & [kY , let us write Sjy for the set of all sequences in Sj^y that never take the 
value k in J. Thus, Sjy is a copy of [k — By the equal-slices density on Sjy we mean 
the image of the equal-slices density on [k — 1]™ (where this is considered as a set in itself 
and not subset of [k]"^). 

Corollary 6.5. Let A be a subset of [/c]'" of uniform density 5, let r] > 0, let m < n^l'^ 
and suppose that n > {IGk/r])^"^. Let J be a random subset of [n] of size m and let y be 
a random element of [kY . Then the expected equal-slices density of A inside the set S'jy 
is at least 5 — 7]. In particular, there exist J and y such that the equal-slices density of A 
inside S'jy is at least 6 — r]. 

Proof. Let be the measure on [k]"^ defined by taking z/'(A) to be the equal-slices measure 
of Ar\[k — 1]™ (considered as a subset of [k — 1]™). In other words, i^'iA) is the probability 
that X E A if you choose a random {k — l)-tuple (ri, . . . , r^.i) of positive integers that add 
up to m and then let x be a random element of [k — 1]™ with rj js for each j. 

Applying Lemma [6. 3[ we find that K^^yu'^ y{A) > 6 — 7], from which it follows that there 
exists a pair {a,y) such that p'^yi^A) > 6 — r]. But u'^y is the equal-slices measure on the 
set Sjy, where J = a{[7n]). □ 

6.2. Prom equal-slices measure to uniform measure. We would now like to go in the 
other direction, passing from a set of equal-slices density 5 to a subspace inside which the 
uniform density is at least 6 — r] for some small 7]. As before, we need to use a result that 
says that a typical sequence x is not too imbalanced. Since we are choosing x from the 
equal-slices measure, the conclusion we can hope for is much weaker than the conclusion 
of Lemma 16.21 the result we use is Lemma 13.51 which tells us that with high probability 
every value will be taken a reasonable number of times. 

The result we prove in this subsection states that if A has equal-slices density 6, then 
there is a distribution on the m-dimensional subspaces of [fc]" such that if you choose one 
at random then the expected uniform density of A in that subspace is at least 6 — p. 
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Lemma 6.6. Let 6,(3 > 0, let m,n and k be positive integers, and suppose that m < 
min{/3n/8/c, (3n/2k^}. Let A be a subset of [/c]" of equal-slices density 6. Let J be a random 
subset of [n] of size [m], let x be chosen uniformly at random from [kY and let y be chosen 
randomly, according to equal-slices measure, from [kY (with this choice made independently 
of x). Then the probability that {x,y) E A is between 5 — P and 5 + /3. 

Proof. Let z be an element of [kY- We shall estimate the probability that {x, y) = z, 
when X and y are chosen as in the statement of the theorem, and compare that with the 
equal-slices probability of the singleton {z}. To do this, let us define uj, for each j G [k], 
to be the number of i such that Xi = j. Let us also assume that uj > 1 for every j . 

We start by considering the case m = 1. In other words, we first pick a random i and 
randomly choose some j G [k]. Then we randomly choose y from equal-slices measure on 
And then we would like to know the probability that j = Zi and yh = Zh for every 

h^i. 

The probability that j = Zi is since we chose j uniformly. Now let us suppose that 
Zi is in fact equal to 1. Then the probability that = z^ for every h ^ i is the equal-slices 
measure of a singleton that consists of a sequence in [k — 1Y~^ with — 1 Is and Uj js 
for every j > 1. That measure is equal to 

n + k-2Y^f n-1 

k-1 J \Ui - 1,U2, . . . ,Uk 

(It is here that we are assuming that ui ^ 0.) For comparison, the equal-slices measure of 
{z} in [kY is 

n -\- k — l\ ^ f n ^ ^ 



k-1 J \Ul,U2, . . . ,Uky 

The first measure divided by the second equals {n -\- k — l)/ui. 

It follows that the probability that {x,y) = z given that Zi = 1 is {n -\- k — l)/kui 
Therefore, by the law of total probability, the probability that (x, y) = z is 



E 



1 Uj n + k — 1 n + k 



i k n Uj n 
1 

times the equal-slices probability of z. 

Now let us consider the more general case where \J\ = m. Again we shall look at the 
probability that {x,y) = z, but this time we shall assume that uj > m for every j. We 
claim that the probability of getting z is 

{n -\- k — l){n -\- k — 2) . . . {n -\- k — m) 



n{n — 1) . . . (n — m + 1) 



times the equal-slices measure of {z}. This follows easily from what we have just done and 
induction. Indeed, by induction we know that if we choose a random set J' of size m — 1 
and choose x uniformly from [kY' and y using equal-slices from [kY' , then the probability 
that {x,y) = z is rn,k,m-i times the equal-slices measure of {z}. If we now change the way 
we choose y by uniformly picking one coordinate and using equal-slices to pick the rest, 
then by the case m = 1 we multiply this probability by a further {n + k — m) /{n — m + 1), 
which gives us r„ fc,„ times the equal-slices measure of {z}. 
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Now rn,k,Tn is at least 1 and at most (1 + k/{n — m))"^. Given our assumption about 
m, this is at most 1 + /3/2. Thus, for every z with at least m coordinates of each value, 
the probability that {x,y) = z lies between v{\zY) and (1 + /3/2)z/({2;}), where v is equal- 
slices measure. By Lemma 13. 5[ the equal-slices probability that z does not have at least 
m coordinates of each value is at most mk'^ /n, which by assumption is at most 13/2. 

Now let A be any subset of [/c]" of density 5 and let B be the set of all sequences such that 
for some j there are fewer than m coordinates equal to j. Then if we choose (x, y) randomly 
in the manner stated, the probability that it belongs to A is at most (1 -|- P/2)v{A) + /3/2, 
since the probability that it belongs to B is at most the equal-slices measure of B (as we 
see by looking at B'^). The probability is also at least v{A) — /3/2, for similar reasons. This 
proves the lemma. □ 

We now show that DHJ implies the equal-slices version of DHJ (which we stated earlier 
as Theorem 13. 7p . 

Corollary 6.7. Let k he a positive integer and suppose that DHJ^ is true. Let S > and 
let n > {16k'^/6)DHJ{k,6/2). Then every subset of [k]^ of equal-slices density at least 6 
contains a combinatorial line. 

Proof. By Lemma 16.61 there exists a combinatorial subspace V of dimension not less than 
DHJ(fc, 6/2) such that the uniform density of A in is at least 6/2. The result follows. □ 

6.3. Prom uniform measure on [fc]" to uniform measure on [k — 1]™. We need one 
more result of a similar kind. This time it says that if we choose a random set J C [n] 
of size m and choose y uniformly at random from [kY and x uniformly at random from 
[k — lY , then the distribution of {x,y) is approximately uniform. This can be proved as 
another almost immediate corollary of Lemma 16.31 However, we shall give a direct proof 
instead, since this case is an easy one and the proof is short. 

Lemma 6.8. Let rj > and let m and n be positive integers with m < vt}!'^ and n > 
{12/riY^. Let J be a random subset of [n] of size m, let y be a random element of [kY and 
let X be a random element of [k — lY (in both cases chosen uniformly) . Then the total 
variation distance between the resulting distribution on (x, y) and the uniform distribution 
on [kY is at most rj. 

Proof. Let z be an element of [/c]", let X be the set of coordinates i such that Zi = k and let 
r be the cardinality of X. By the proof of Lemma [6.21 the probability that r lies between 
n/k — r?!"^ and n/k + r?!"^ is at least 1 — 2 exp(— 2n^/'^), which is at least 1 — 77/3. Let us 
assume that z has this property. Now choose J and let us calculate the probability that 
(x, y) = z conditional on this choice of J. 

If J n X 7^ 0, then the probability is zero. If, however, if J fl X = then it is {k — 
^ymf^-(m-n)_ r^j^^ probability that J n X = is (";/'), which lies between {1 - l/k - 
n~^^^ — m/n)"^ and (1 — 1/k + n"^/"^)™. A simple calculation shows that it therefore lies 
between (1 - 1/A;)'"(1 - An-^/^^) and (1 - l/k)'^{l + An'^/^'^). Therefore, the probability 
that {x,y) = z lies between /c~"(l ± ri/3). 

Let B be the set of all z such that r does not lie between n/k — tl^I"^ and n/k + r?!"^ . 
Then the probability that (x, y) ^ B is at most 1 — (1 — rj/S)"^ < 2?7/3. Therefore, if A is 
any subset of [kY and 6 is the density of A, the probability that {x,y) G A lies between 
{6 — r]/3){l — r]/3) and 6{1 + tj/S) + 2?7/3, which proves the lemma. □ 
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7. A DENSE SET WITH NO COMBINATORIAL LINE CORRELATES LOCALLY WITH AN 

INTERSECTION OF INSENSITIVE SETS 

In this section we shall carry out the first three stages of the proof of DHJ^ (corresponding 
to the first three stages of the sketch proofs given earlier of the corners theorem and DHJ3). 

7.1. Finding a dense diagonal. Let A be a subset of [/c]" of density 6. The aim of this 
subsection is to find a combinatorial subspace V of [k]"" with two properties. First, the 
density of A inside V is not much smaller than 6, and second, there are many points of A 
in V for which the variable coordinates take values in [/c — 1] . The densities in both cases 
are with respect to equal-slices measure. The second statement corresponds to the title 
of this subsection: this step is analogous to finding a dense diagonal in the corners proof. 
However, that proof gave us a dense structured set that was disjoint from A. Here, what 
we get is a structured set that is dense in a subspace. This will not help us at all unless A 
still has density almost 6 (or better) in that subspace. Thus, there is shghtly more to this 
step than there was in the corners proof. 

Lemma 7.1. Let A C [k]^' be a set of uniform density S, let < 7] < 5 /A, let m < rv"l^ 
and suppose that n > [IGk/r])^'^. Then there exists a pair {J,y), where J is a subset of [n] 
of size m and y G [kY , such that one of the following two possibilities holds: 

(i) the equal-slices density of A in the subspace Sj^y is at least 5 + 77; 

(a) the equal-slices density of A in Sj^y is at least 6 — 4:r]6~^ and the equal-slices density 
of A in Sjy is at least 6/4. 

Proof. By Corollary 16. 4^ if we choose J and y randomly then the expected equal-slices 
density of A in Sj^y is at least 6 — rj. If the density is never more than S + t], then the 
probability that it is less than 6 — Ari6~^ is less than 6/2, since otherwise the average would 
be at most 

(1 - 6/2){6 + r]) + {6/2){6 - 4r/r^) = 6 + {1 - 6/2)r] -2r]<6~rj, 
a contradiction. 

By Corollary 16. 5l the average density of A in a random set Sjy is at least 6 — r]. Therefore, 
the probability that A has density less than 5/4 in Sjy is at most 1 — 5/2, since otherwise 
the average would be at most 

6/2 + (1 - 5/2) (5/4) < 35/4 < 5 - r/, 

another contradiction. 

It follows that if (i) does not hold then with positive probability (ii) holds. □ 

What Lemma 17.11 tells us is that either we can pass to a subspace and get a density 
increment of 77, in which case we can move to the next stage of the iteration (after passing 
to a further subspace to convert this density increment into a uniform density increment), 
or we find a "dense diagonal" in a subspace in which A has not lost a significant amount 
of density. 

7.2. A "simple" locally dense set that is almost disjoint from A. Let us suppose 
that the second possible conclusion of Lemma ET] holds (for an 77 that we are free to choose 
later). Then we have a combinatorial subspace V of m dimensions and A contains many 
points in V for which the variable coordinates are all in [/c — 1] . For simplicity, and without 
loss of generality, let us assume that V = [k]"^, and let us write AioiAnV. So we are 
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given that A has equal-shces density at least 5 — 7 (where 7 = ArjS'^) and inside [k — 1]™ 
has equal-shces density at least 5/4. Let us write B for An [k — I]'". Finally, if x G [k]"^ 
and i,j G [k], let us write x^~^^ for the sequence that turns all the is of x into js. 

Lemma 7.2. Let A be a subset of [fc]™ that contains no combinatorial line, and let B = 
An[k- 1]™. For each j < k - 1 let Cj be the set {x G [A;]™ : x''^^ G B}. Then Cj is a 
jk-insensitive set, and An CiH ■ ■ ■ H Ck-i (l[k — 1]™. 

Proof. Since the condition for belonging to Cj depends only on x^~*^ , it is trivial that Cj 
is j/c-insensitive. 

Suppose now that x G Cifl - ■ -{^Ck-i and that at least one coordinate of x takes the value 
k. Let X be the set of coordinates where x = k. Then if you change all the coordinates 
in X to j, you end up with a point that belongs to A, since x G Cj. Therefore, since A 
contains no combinatorial line, it follows that x itself does not belong to A. □ 

Lemma 7.3. Let A, B and Ci, . . . , Cfc„i be the subsets of [/c]™ defined in Lemma \77^ and 
let C = Ci n ■ ■ • n Cfc-i. Then for every 6 > there exists 6 > such that if B has 
equal-slices density at least 6/ A in [k — then C \[k — 1]™ has equal-slices density at 
least 9 in [fc]™. 

Proof. There is a one-to-one correspondence between combinatorial lines in B and points 
in C \ [/c — 1]™. Moreover, this one-to-one correspondence preserves equal-slices measure 
(for the trivial reason that we defined the equal-slices measure on the set of combinatorial 
lines in [k — 1]™ by treating them as points in [/c]™). By the probabilistic version of DHJ^-i 
there exists 6 = PDHJ{k — 1, 5/4) > such that the equal-slices density of combinatorial 
lines in B is at least 6. □ 

From this lemma and Lemma [3.21 we see that u^AdC) < {k/9m)v{C). (Recall that u is 
the equal-slices measure.) If m is large enough, that will be significantly less than 5. This 
is the sense in which A is "almost disjoint" from C. 

7.3. A "simple" locally dense set that correlates with A. 

Lemma 7.4. Let A, B and Ci, . . . , Ck-i be the subsets of [k]^ defined in Lemma \7.2\ let 



C = Ci n ■ ■ ■ n Ck^i, and suppose that C has density 6. Let < 7 < 5/4 and suppose also 
that h'i^A) > 5 — 7 and that ^{AnC) < (5/2)z/(C). Then there exist sets Di, . . . , Dk-i such 
that Di is ik -insensitive for each i and such that v{A fl -D) > (5 — ^)v{D) + 66 /Ak, where 
D = DiQ-'-n Dk^i. 

Proof. We begin with the observation that 

k 

[k]^ = IJCi n ■ ■ ■ n n Q n ■ • ■ n 

i=l 

and that this union is in fact a partition of [k]^. For each i let us write D'-*^ for the set 
Ci n ■ ■ ■ n n Q n ■ ■ ■ n Q.^. Then D(^) = C. From our assumptions, we know that 

iy{A n {D^'^ U • ■ ■ U Z}(^"i))) > 5 - 7 - (5/2)i^(D('=) 

= (5 - 7)(1 - i^iD^"^)) + (5/2 - 7)z/(DW) 

> (5-7)(l-i/(Z^(^"))) + 5^/4. 
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Since 1 - u{D^''^) = iy{D^^^ U ■ ■ ■ U D^^-^^), it follows by averaging that there exists i such 
that u{Ar] D^^^) > 5 — 'J + 59/A{k — 1). Since both Ci and Cf are z/c-insensitive, this proves 
the lemma. □ 

Now for the next part of our argument we need to use the uniform measure. In order to 
do this, we must use our measure-transfer results again. Basically, all we do is randomly 
restrict to a small subspace V with the uniform measure on it and apply Lemma 16.61 but 
that is not quite the whole story since we want two things to happen: that the relative 
density of An D CiV inside D (IV is still bigger than 6, and also that the relative density 
of D nV inside V is not too small. 

Lemma 7.5. Let /3 > and let k, r and m be positive integers such that r < min{/3m/8/c, 
/3m/2fc2}. Let A and D be subsets o/[/c]"' such thatu{AnD) > {5-'y)u{D)+3f3 . Then there 
exists a combinatorial subspace V of [k]"^ of dimension r such that nv{D r\V)> 7/i(V^) 
and n D n V) > {6 — 'j)fj,v{D nV) + (3, where fj,v is the uniform probability measure 
on V . 

Proof. Let us choose V by randomly choosing a set J C [m] of size r, randomly choosing 
y E [kY using equal-slices measure, and taking the subspace Sj^y. By Lemma [631 the 
expectation of fxviADDnV) - {5 -'j)iJ.v{Dr]V) is at least i^{AnD) - (3 - {5 -'j)u{D) - (3, 
which is at least /3 by our assumed lower bound for //(A (ID). □ 

Note that the conclusion of the lemma implies that ^v{D fl V) is at least rj. 
Let us now put together the results of this section. 

Lemma 7.6. Let 5 > 0, let k be a positive integer, let 9 =PDHJ{k — 1,5/4), let rj = 
6'^e/9Qk, let /3 = 59/l2k and let 7 = U'^rj = 6e/24k = (3/2. Let n be a positive integer, 
let m = [n^^'^\, let r = [(3m/8k^\ and suppose that n > {IGk/'qY'^. Let A be a subset of 
[fc]" of uniform density 6. Then either A contains a combinatorial line or there is an r- 
dimensional combinatorial subspace W of [k^ and sets Di, . . . , D^^i C W such that Dj is 
jk -insensitive for each j , and such that if we set D to be DiCl ■ ■ ■ (1 D^^i, then fiw{D) > 7 
and fiwiA (1 D) > {6 + -i)^w{D). 

Proof. Let m = [n^^^\ . Then, by Lemma I7.H either there is an m-dimensional subspace 
V such that > 5 + 77, in which case we are done (since we can pass to a random 

r-dimensional subspace of V and on average we will have the same density increment) or 
there is an m-dimensional subspace V such that the equal-slices density of A in is at 
least 6 — 4?7(5~^ and the equal-slices density of A in V is at least 5/4, where V is the set 
of points in V with no variable coordinate equal to k. 

Let B = A n v. Then Lemma 17.31 gives us a ^ > and sets Ci, . . . , Ck~i such that 
Ci is z/c-insensitive, the intersection C = Ci fl ■ ■ ■ fl Ck-i is such that C \ V' has equal- 
slices density at least 6', and C \ V^' is disjoint from A. The value of 9 can be taken to be 
PDHJ(A;- 1,5/4). 

Let 7 = 4r/5-^ = (3/2. It is easily checked that k/9m < 5/2 and that 59 /Ak > 27. 
Therefore, Lemma 17.41 tells us that we can find sets Di,...,Dk^i such that Di is ik- 
insensitive, and such that if D = Di n ■ ■ ■ n Dk-i, then u{A n D) > (5 - 'j)iy{D) + 59 /Ak. 

Finally, Lemma [7.51 with /3 = 59 /12k gives us an r-dimensional subspace of such 
that fxw{A n D nW) > {5 - -f){D n W) + (3. This implies that fxw{A n D n W) > 
(5 + ■y)^w{D n W) and that fiwiD HW) >j, as claimed. □ 
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8. Almost partitioning low- complexity sets into subspaces 

We have completed one of the two main stages of the proof, which corresponds to the 
first three steps of the proof we sketched of the corners theorem (and also to the first 
three steps of out sketch proof of DHJ3). In this section we shall carry out a task that 
corresponds to the next two steps. So far, we have obtained a density increment on a dense 
subset D of a. subspace W. This helps us, because D is an intersection of zA;-insensitive sets, 
and therefore has low complexity, in a certain useful sense. Our job now is to show that 
low- complexity sets can be almost completely partitioned into combinatorial subspaces 
with dimension tending to infinity. To prove this, we shall follow the scheme of argument 
presented in Section 15. 4[ (That argument was presented for the case k = 3, but it can be 
straightforwardly generalized.) 

8.1. A l/c-insensitive set can be almost entirely partitioned into large subspaces. 

We begin by proving the result for 1 A;- insensitive sets, and hence for j A;- insensitive sets 
whenever j < k. It will then be straightforward to deduce the result for intersections of 
such sets. 

Lemma 8.1. Let rj > 0, and let d, m and n he positive integers with m >MDHJk-i{d,ri) 
and n > rj^^mik + d)^ . Let D he a Ik-insensitive suhset of [fc]" . Then there are disjoint 
combinatorial suhspaces Vi, . . . , V^, each of which has dimension d and is a suhset of D, 
such that ii{Vi U ■ ■ ■ U Va^) > /!(£>) - 877. 

Proof. Let us write a typical element of [kY as {x,y), where x G [k]"^ and y G [fc]""*". For 
each y let us write Dy for the set {x e [/c]"^ : {x, y) E D} and Ey for the set {x E [k — 1]"^ : 
{x,y) G D} = Ey n [k — 1]™. Then by Lemma [6.81 the average density of the sets Ey is at 
least 7 — 77 > 2?7. It follows that the density of y such that Ey has density at least 77 (in 
[k — 1]™) is at least rj. 

If Ey has density at least r], then by our assumption about m it follows that it contains a 
(i- dimensional combinatorial subspace Uy (where this means a subspace of [k — 1]™). Since 
D is 1/c-insensitive, and therefore so is Dy, it follows that Dy contains a (i-dimensional 
combinatorial subspace Uy (where this means a subspace of [k]"^). 

The number of possible d-dimensional subspaces of [k]"^ is at most {k + d)"^ (since we 
have to decide for each coordinate i G [m] whether to give it a fixed value in [k] or to 
put it into one of the d wildcard sets), so by the pigeonhole principle there must exist a 
subspace U C [/c]™ such that the set T = {y E [/c]"'"'" : U x {y} c D} has density at least 
ri{k + d)"*". Since D is l/c-insensitive, it follows that T is also 1/c-insensitive. 

The set U x T is a subset of D of density at least r]{k + d)~"^ , and it is a union of the 

dimensional subspaces U x {y} with y eT. We now remove U x T from D. 

The resulting set Di = D\[UxT) is not necessarily 1/c-insensitive, but for every x G [k]"^ 
the set {y : (x, y) G Di} is 1/c-insensitive: this follows immediately from the fact that both 
D and T are 1/c-insensitive. Thus, we can at least partition [A;]" into subspaces inside each 
of which Di is Ifc-insensitive. 

This gives us the basis for an inductive argument. The inductive hypothesis is that Dj. is 
a set of density at least 2ri such that for every x G [kY"^ the set {y G [/c]""''™ : (x, y) E Dr} 
is l/c- insensitive, and that D\Dr is a union of (i-dimensional subspaces of density at least 
r]{k + d)~"^. We have essentially just given the proof of the inductive step, but we need to 
generalize the argument very slightly. 
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To do this, let us write a typical element of Dr as {x,y,z) with x G [kY"^, y G [k]"^ 
and z G [A;]"-(^-+i)"'. For each x G [kY"" let be G [A;]""'^™ : {x,y,z) G D,} 

and for each pair {x,z), let {Er)x,z be the set {y G [k — l]™ : {x,y,z) G -0^.}. Then the 
average density of the sets {Dr)x is the density of D^, which is at least Sr]. It follows 
from Lemma Em that the average density of the sets {Er)x,z is at least 2rj, provided that 
n — rm > (12/?7)^^. Therefore, the density of pairs {x,z) such that {Er)x,z has density at 
least r] is at least r]. 

If {Er)x,z has density at least 77, then it contains a li- dimensional combinatorial subspace 
2) where this is a subspace of [k — 1]"*. Since {Dr)x is 1/c-insensitive, it follows that 
it also contains a (i-dimensional combinatorial subspace Ux^z, where this time we mean a 
subspace of [/c]™. By the pigeonhole principle there is a (i-dimensional subspace U G [/c]™ 
such that the set T = {{x, z) G [kY"^ x : {x} x U x {z} C Dr} has density at 

least 7]{k + rf)-™. 

Let Dr+i = Dr\T xU (where we interpret T xU to mean {(x, y, z) : {x, z) G T,y G U}). 
Then T x f/ is a union of dimensional subspaces of density at least ri{k + and for 

every {x,y) the set G : (^x,y,z) G -Dr+i} is Ifc-insensitive. 

Clearly we cannot iterate this process more than T]~^{k + d)"^ times. Therefore, since 
n > rj~^m{k + (i)"*, it follows that we can write D as a disjoint union of (i-dimensional 
combinatorial subspaces and a residual set of density at most 3?7, as claimed. □ 

8.2. An intersection of jfc-insensitive sets can be almost entirely partitioned into 
large subspaces. The main result of this subsection is a very straightforward consequence 
of Lemma 18.11 Let F be the function that bounds n in terms of d in that lemma (and 
also 7] and k, which we shall regard as fixed): that is, F{d) = \ri~^m{k + (i)™], where 
m =MDHJfc_i((i, 77). Let F^'^'^^d) denote the result of applying F to d k — 1 times. 

Lemma 8.2. Let r] > 0, and let d and n he positive integers such that n > F^'^~^\d). For 
each j G [k — 1] let Dj be a jk-insensitive subset of [kY and let D = DiH ■ ■ ■ H -D/c-i- Then 
there are disjoint combinatorial subspaces Vi, . . . , Vn, each of which has dimension d and 
is a subset of D, such that fiiVi U ■ ■ ■ U Vat) > /i(-D) — 3{k — I)//. 

Proof. We prove the result by induction on the number of insensitive sets in the intersection 
(which is not quite the same as proving it by induction on k). That is, we prove by induction 
that if 77, > F^^\d) then the conclusion of the lemma holds for D^^'^ = Di D ■ ■ ■ D Dj and 
with an error of at most Sjrj instead of 3 (A; — I)//. 

Lemma 18.11 does the case j = 1. In general, if we have the result for j — 1, then let 
n > F^^\d) = F{F^^~^{d)). Then by Lemma (8.11 we can partition Dj into combinatorial 
subspaces Vi, . . . ,Vn oi dimension F^^~^\d) together with a residual set of density at most 
877. The intersection of any D^ with any Vi is /i/c- insensitive, and Vi C Dj, so 

D^') nv = D^^~^^ nv = {Dir\Vi)r\---r\ (D^.i n v) 

is an intersection of insensitive sets to which we can apply the inductive hypothesis. 

That allows us to partition each Vi into combinatorial subspaces Vis of dimension d 
together with a residual set of relative density (in Vi) at most 3(j — I)?]. The union of these 
new residual sets has density at most 3(j — l)r] in [kY (since the subspaces Vi are disjoint), 
so we have partitioned D^^'^ into a union of (i-dimensional combinatorial subspaces together 
with a residual set of density at most Sjrj. This completes the inductive step. □ 
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9. Completing the proof 

At this stage our argument is essentially finished. In this section we shall spell out 
why our lemmas show that DHJ^ follows from DHJfc_i. We shall begin with a qualitative 
argument. After that, we shall informally discuss how the bounds we obtain for DHJ^ 
depend on those that we obtain for DHJ^.i. Finally, we shall exploit the fact that we have 
good bounds when A; = 2 to give a more careful analysis of the bounds we obtain for DHJ3, 
which turn out to be of tower type. 

9.1. Proof that DHJ^.i implies DHJ^. Let A C [/c]" be a set of density 5. Our aim 
will be to find a combinatorial subspace V of dimension tending to infinity with n such 
that the relative density of A fl V in ^ is at least 6 + c, where c depends only on 6 and k. 
If we can do that, then we will be able to apply a simple iterative argument to complete 
the proof. 

Lemma 17.61 says that either A contains a combinatorial line or we can find an r- 
dimensional subspace W and subsets Di, . . . , Dk-i of W such that if -D = fl ■ ■ ■ fl Dk-i 
then /ivi/(-D) (the density of D inside W) is at least 7 and Hwi.^ P\ D) > {5 + j)fi{D). 
Here, r tends to infinity with n for given S and k (and increases as S increases), and 7 is 
a parameter that depends on 6 and k only. To be precise, if we let 6 =PDHJ(/c — 1,5/4), 
then we can take 7 = 50/2Ak and r = \_S9\n^^'^\/96k^\. Thus, this step depends on the 
fact that DHJfc_i implies PDHJfc_i. 

Now apply Lemma 18.21 with [fc]" replaced by the r-dimensional subspace W and with 
T] = 7^/6(/c — 1). Then we can find disjoint combinatorial subspaces Vi, . . . ,Vn of W such 
that each has dimension equal to the largest d for which r > F^''~^\d), each is a subset of 
D, and ^wiYi U ■ ■ ■ U Vn) > fiw{D) — 7^/2. Here d depends on t] and k as well as r (the 
dependence was suppressed in our notation for the function F) and tends to infinity as r 
tends to infinity. The function F is defined in terms of the function MDHJfc_i, so this step 
depends on the fact that DHJ^^i implies MDHJfc_i. 

It follows that 

IJ,w{A n (1^1 U ■ ■ ■ U Vm)) > (5 + l)KD) - 7V2 

>(5 + 7/2)/i(^) 

> {5 + ^/2)iiw{ViU---UVm). 

Therefore, by averaging there must be some i such that fiw{A fl Vi) > (5 + 7/2)yu(\^). 

Since d, the dimension of Wi tends to infinity with r and r tends to infinity with n, 
and since 7 depends on 6 and k only, we have found our desired density increment on a 
subspace. We may now repeat the argument. Either A fl V^j contains a combinatorial line, 
or we can pass to a further subspace (with dimension tending to infinity with d and hence 
with n) inside which the relative density is at least 5 + 7. (In fact, we can do slightly 
better, since we have now replaced 5 by 5 + 7/2 so the density increment at this second 
stage will be better than 7/2.) Since the density of A inside any subspace is always at 
most 1, there can be at most 2/7 iterations of this procedure before we eventually find a 
combinatorial line. Since this number of iterations depends only on 6 and k, if the original 
n is large enough, A must have contained a combinatorial line. 

Since DHJi is trivial and DHJ2 follows from Sperner's theorem, the proof of the general 
case of DHJ is complete. 
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9.2. What bound comes out of the above argument? Let us briefly consider how 
the bound that we obtain for DHJ^ relates to the bound that we obtain for DHJfe_i. 

We note first that EDHJ(/c-l, 6) is bounded above by {IGk"^ /6)DB.J{k, 8/2), by Corollary 
16.71 (but all we really care about for the purposes of this discussion is that the two functions 
are of broadly similar type). Next, recall from Theorem 13.71 that if A C [k — 1]" has equal- 
slices density at least 5, then the equal-slices density of the set of combinatorial lines in A is 
at least (5/9)A;-'", where m =EDHJ(A; - 1,5/4). That is, PDHJ(fc - 1,(5) is exponentially 
small as a function of EDHJ(/i; — 1,5), and hence as a function of DHJ(fc — 1,5). In 
particular, if DHJ(A; — 1,5) is already a tower- type function, then PDHJ(A; — 1,5) behaves 
broadly like the reciprocal of DHJ(A; — 1,5). It follows that the subspace we pass to in 
Lemma 17.61 has dimension broadly comparable to n/DHJ(/c — 1,5). Equivalently, if we 
want to pass to an r-dimensional subspace then we need n to be at least rDHJ(A; — 1,5) 
or so. 

The next step depends on MDHJfc_i, and this is where things get very expensive. The 
proof we gave of MDHJ^.i yields a bound that is obtained as follows. Define Gk-i{x) to 
be exp{DHJ{k — 1, 1/x)). Then MDHJ(/i; — 1, rf, 5) is bounded above by G^'^j^(l/5), where 

G^k-i (i-fold iteration of Gfc-i- The function F that comes into Lemma [8^ is broadly 

comparable to MDHJ(A; — l,d,6) (again, assuming that MDHJ(A; — l,d,6) is at least of 
tower type), so F^''~^^ is something like G^'^i ■ 

This function is so much bigger than the function r i— )-DHJ(A; — 1, 5) that we can more 
or less ignore the former. Therefore, if we want to end up with a d- dimensional subspace 
after one round of the main iteration, we need to start with n being something like the 
d{k — l)-fold iteration of a function that has similar behaviour to the function Gk-i defined 
above, which is pretty similar to the function d i— )-DHJ(A; — 1, 1/d). We then have to run 
the whole iteration 2/7 times, where 7 is broadly comparable to DHJ(A; — 1,5)^^. So 
eventually we need n to be larger than [k — l)dDB.J{k — 1,5) iterations of the function 
d i-^DHJ(/c — 1, l/d), which is roughly DHJ(/c — 1,5) iterations. 

To rephrase slightly, if we let RDHJfc_i(s) =DHJ(A; - 1, 1/s) (the "R" stands for "re- 
ciprocal" here), then RDHJfc(s) is obtained by iterating the function RDHJfc_i roughly 
RDHJfc_i(s) times. 

This means that as k increases by 1, the function RDHJ^ goes up by one level in the 
Ackermann hierarchy. (It is bigger than the corresponding level of the Ackermann function, 
but not in an interesting way.) 

9.3. Bounds for DHJ3. When /c = 3, we can obtain much better bounds because in this 
case we have reasonable bounds for MDHJ^^i. Let us therefore do the analysis a little 
more carefully. 

First, note that Theorems 13.11 and 12.31 tell us that we can take PDHJ(2,5) to be 5^/2 
and MDHJ(2, d, 5) to be 255^^ . Therefore, returning to the argument given in §9.11 and 
setting = 3, we can take 6 to be 5V32, 7 = 5V2304, and r = [5^ [n^/^J /41472J . 

We apply Lemma lO with r/ = 7V6(fc - 1) = 5712(2304)^, which is at least 6^/2"^^ 
Therefore, MDHJ(2, rf, r/) is at most 25(227^-6)2^, and if d > 10, say, then F{d) can be 
bounded above by 2 t 5~^ f 2 2d, where the symbol t denotes exponentiation and 
xtytz means xt ivt z). It follows that F^^'^{d) is at most 2 t 5~^ t 2 t 2 t 2 t 3c?. 
(The final 3 instead of 2 is to (over) compensate for losing a factor of 2 earlier on in the 
tower.) 
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We may therefore take d to he 6 log^ ' r, where log^ ' is the six-fold iterated logarithm. 
In fact, the factor of 6 is unduly generous, so, bearing in mind our bound for r in terms of 
n, it is safe to take d to be (5/2) log*-^^ n. (Strictly speaking, we need to assume that n is 
sufficiently large, but if we are generous later then this requirement will be met by a huge 
margin.) 

The number of iterations we need is certainly no more than 2304/5^, but we can in fact do 
slightly better. It takes at most 2304/5^ iterations for the density to increase from 6 to 26. 
Therefore, the total number of iterations is at most 23045-2(1 + 1/4+1/16+. . . ) = 30725"^. 
It follows that DHJ(3, 6) is bounded above by a tower of 2s of height 20000(5-^. (Since 
20000 > 6 X 3072, the dimension of the space will still be vast when the iterations come to 
an end.) This proves the estimate claimed in Theorem 11.51 
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